Skip to Content
Canonical Schema

Canonical Schema

Every extracted Unit is normalized into a 40+ field canonical schema. Fields are grouped into five categories.

Core Unit Fields (9)

FieldTypeDescription
property_namestringProperty/community name
property_addressstring | nullProperty street address
unit_idstringUnit identifier (required)
floor_planstring | nullFloor plan name/code
sqftinteger | nullUnit square footage
bedsinteger | nullBedroom count
bathsfloat | nullBathroom count (supports half-baths)
unit_statusstringNormalized status (see below)
tenant_namestring | nullCurrent tenant name

Status Normalization

Raw status values are normalized to one of 7 canonical values:

CanonicalCommon Raw Values
OccupiedOCC, CURRENT, O, RENEWED
VacantVAC, VACANT, V, AVAILABLE, READY
NoticeNTV, NOTICE, N, NTC, ON NOTICE
ModelMODEL, MDL, M
DownDOWN, DWN, D, OFFLINE, MAKE READY
ApplicantAPP, APPLICANT, APPLIED, PENDING
UnknownAnything unrecognized

Lease Fields (6)

FieldTypeDescription
move_inYYYY-MM-DD | nullMove-in date
move_outYYYY-MM-DD | nullMove-out date
lease_startYYYY-MM-DD | nullCurrent lease start date
lease_endYYYY-MM-DD | nullCurrent lease end date
lease_term_monthsinteger | nullLease term in months
is_mtmboolean | nullMonth-to-month flag

All dates are normalized to ISO YYYY-MM-DD format. The parser handles:

  • MM/DD/YYYY and M/D/YYYY
  • YYYY-MM-DD (passthrough)
  • DD-Mon-YY (e.g., 15-Jan-24)
  • Excel serial numbers (days since 1900-01-01)
  • Various separator characters (/, -, .)

Rent (1)

FieldTypeDescription
market_rentnumber | nullMarket/asking rent amount

Charge Categories (20)

All charge fields are number | null. Currency symbols and formatting are stripped. Parenthesized negatives like (1,250.00) are converted to -1250.

FieldDescription
charge_base_rentMonthly base rent
charge_petPet rent/fee
charge_parkingParking charges
charge_storageStorage unit fees
charge_utilitiesUtility charges (RUBS, flat fee)
charge_trashTrash/valet trash
charge_water_sewerWater and sewer charges
charge_cable_internetCable/internet fees
charge_insuranceRenter’s insurance
charge_amenityAmenity/club fees
charge_concessionConcessions (usually negative)
charge_employee_discountEmployee discount
charge_mtm_premiumMonth-to-month premium
charge_furnitureFurnished unit fee
charge_garageGarage charges
charge_laundryLaundry fees
charge_late_feeLate payment fees
charge_otherCatch-all for unclassified charges
charge_other_2Second catch-all
charge_other_3Third catch-all

Balances & Totals (4)

FieldTypeDescription
total_chargesnumber | nullSum of all charge fields
balance_currentnumber | nullCurrent balance owed
balance_prepaidnumber | nullPrepaid amount
deposit_amountnumber | nullSecurity deposit amount

Output File Format

The Clean Output contains two Sheets:

  1. Rent Roll — one row per Unit, columns in the order listed above
  2. Validation Report — one row per validation finding (errors and warnings)

All extraction is literal — no inference, no imputation. If a field isn’t present in the source data, it’s null in the output.

Last updated on