Canonical Schema
Every extracted Unit is normalized into a 40+ field canonical schema. Fields are grouped into five categories.
Core Unit Fields (9)
| Field | Type | Description |
|---|---|---|
property_name | string | Property/community name |
property_address | string | null | Property street address |
unit_id | string | Unit identifier (required) |
floor_plan | string | null | Floor plan name/code |
sqft | integer | null | Unit square footage |
beds | integer | null | Bedroom count |
baths | float | null | Bathroom count (supports half-baths) |
unit_status | string | Normalized status (see below) |
tenant_name | string | null | Current tenant name |
Status Normalization
Raw status values are normalized to one of 7 canonical values:
| Canonical | Common Raw Values |
|---|---|
Occupied | OCC, CURRENT, O, RENEWED |
Vacant | VAC, VACANT, V, AVAILABLE, READY |
Notice | NTV, NOTICE, N, NTC, ON NOTICE |
Model | MODEL, MDL, M |
Down | DOWN, DWN, D, OFFLINE, MAKE READY |
Applicant | APP, APPLICANT, APPLIED, PENDING |
Unknown | Anything unrecognized |
Lease Fields (6)
| Field | Type | Description |
|---|---|---|
move_in | YYYY-MM-DD | null | Move-in date |
move_out | YYYY-MM-DD | null | Move-out date |
lease_start | YYYY-MM-DD | null | Current lease start date |
lease_end | YYYY-MM-DD | null | Current lease end date |
lease_term_months | integer | null | Lease term in months |
is_mtm | boolean | null | Month-to-month flag |
All dates are normalized to ISO YYYY-MM-DD format. The parser handles:
MM/DD/YYYYandM/D/YYYYYYYY-MM-DD(passthrough)DD-Mon-YY(e.g.,15-Jan-24)- Excel serial numbers (days since 1900-01-01)
- Various separator characters (
/,-,.)
Rent (1)
| Field | Type | Description |
|---|---|---|
market_rent | number | null | Market/asking rent amount |
Charge Categories (20)
All charge fields are number | null. Currency symbols and formatting are stripped. Parenthesized negatives like (1,250.00) are converted to -1250.
| Field | Description |
|---|---|
charge_base_rent | Monthly base rent |
charge_pet | Pet rent/fee |
charge_parking | Parking charges |
charge_storage | Storage unit fees |
charge_utilities | Utility charges (RUBS, flat fee) |
charge_trash | Trash/valet trash |
charge_water_sewer | Water and sewer charges |
charge_cable_internet | Cable/internet fees |
charge_insurance | Renter’s insurance |
charge_amenity | Amenity/club fees |
charge_concession | Concessions (usually negative) |
charge_employee_discount | Employee discount |
charge_mtm_premium | Month-to-month premium |
charge_furniture | Furnished unit fee |
charge_garage | Garage charges |
charge_laundry | Laundry fees |
charge_late_fee | Late payment fees |
charge_other | Catch-all for unclassified charges |
charge_other_2 | Second catch-all |
charge_other_3 | Third catch-all |
Balances & Totals (4)
| Field | Type | Description |
|---|---|---|
total_charges | number | null | Sum of all charge fields |
balance_current | number | null | Current balance owed |
balance_prepaid | number | null | Prepaid amount |
deposit_amount | number | null | Security deposit amount |
Output File Format
The Clean Output contains two Sheets:
- Rent Roll — one row per Unit, columns in the order listed above
- Validation Report — one row per validation finding (errors and warnings)
All extraction is literal — no inference, no imputation. If a field isn’t present in the source data, it’s null in the output.
Last updated on