Validation Results
Pipeline validation across 16 properties and 8 vendor formats.
At a Glance
| Properties tested | 16 |
| Vendor formats | 8 distinct PMS exports |
| Total Units processed | 3,732 |
| Extraction errors | 0 |
| Spot check confidence | 100/100 on every file |
Zero extraction errors across 3,732 Units spanning 8 different property management system formats. Every file scored 100/100 on the independent spot check audit.
Test Matrix
Each file was processed through the full pipeline: Upload, Scan, Structure Analysis (Claude Opus), Confirmation, Extraction (Claude Sonnet, parallel chunks), Spot Check (Claude Sonnet audit), and Validation.
Horizontal Formats
Standard one-row-per-Unit layouts where every field is directly visible in the source.
| Property | PMS Format | Units | Errors | Spot Check |
|---|---|---|---|---|
| Tuscany Park | Yardi Voyager | 392 | 0 | 100/100 |
| Stonecrest Mill | Yardi Voyager | 280 | 0 | 100/100 |
| Doria | Yardi Voyager | 160 | 0 | 100/100 |
| Stevens Creek | Simple horizontal | 256 | 0 | 100/100 |
| Magnolia Villas | Simple horizontal | 144 | 0 | 100/100 |
| Poinsett View | AppFolio | 111 | 0 | 100/100 |
| RUN One Line | RealPage (50 cols) | 64 | 0 | 100/100 |
Vertical / Multi-Row Formats
Layouts where each Unit spans multiple rows (charge code + amount on separate sub-rows). The pipeline aggregates charges across sub-rows automatically.
| Property | PMS Format | Units | Errors | Spot Check |
|---|---|---|---|---|
| Camelback | RealPage/OneSite | 334 | 0 | 100/100 |
| 3000 at Med Center | RealPage/Entrata | 324 | 0 | 100/100 |
| Grayson | RentManager | 322 | 0 | 100/100 |
| The Hudson | Yardi (vertical) | 320 | 0 | 100/100 |
| Allure at Edinburgh | Yardi (vertical) | 280 | 0 | 100/100 |
| Tides ES | RealPage/Entrata | 168 | 0 | 100/100 |
| Woodland Mews | Yardi (vertical) | 233 | 0 | 100/100 |
| Elm Creek | Yardi (vertical) | 168 | 0 | 100/100 |
| Adams Square | RealPage/Entrata | 106 | 0 | 100/100 |
Totals
| Total Units | 3,732 |
| Total errors | 0 |
| Total warnings | 15 (all informational — see below) |
| Spot check | 100/100 on all 16 files |
Download Test Files
For each property, download the original rent roll and the standardized 40-field output. Each output includes a “Rent Roll” sheet (extracted data) and a “Validation Report” sheet (spot check results, warnings, and extraction metadata).
Warning Details
All 15 warnings are informational flags for human review — none indicate data integrity failures or extraction errors. Every warning falls into one of three categories:
Row Count Mismatch (10 warnings)
On vertical-format files, each Unit spans multiple source rows. The pipeline correctly aggregates sub-rows into one Unit record. The warning simply notes the difference between source row count and extracted Unit count.
| Property | Source Rows | Extracted Units | Avg Rows/Unit |
|---|---|---|---|
| Grayson | 2,229 | 322 | 6.9 |
| The Hudson | 1,930 | 320 | 6.0 |
| 3000 at Med Center | 1,923 | 324 | 5.9 |
| Camelback | 1,687 | 334 | 5.1 |
| RUN One Line | 1,577 | 64 | 24.6 |
| Allure at Edinburgh | 1,253 | 280 | 4.5 |
| Woodland Mews | 1,390 | 233 | 6.0 |
| Elm Creek | 650 | 168 | 3.9 |
| Tides ES | 580 | 168 | 3.5 |
| Adams Square | 345 | 106 | 3.3 |
Charge Sum Rounding (3 warnings)
When aggregating individual charges into total_charges, sub-penny amounts occasionally produce rounding differences of a few dollars. The individual charge values are correct.
| Property | Unit | Difference |
|---|---|---|
| 3000 at Med Center | 304 | $5.00 |
| 3000 at Med Center | 406 | $5.25 |
| Allure at Edinburgh | 1219 | $7.00 |
Occupied with Zero Base Rent (2 warnings)
These are real data conditions in the source file — not extraction errors. Common for employee units, units in transition, or units with non-standard lease structures.
| Property | Unit | Reason |
|---|---|---|
| Stonecrest Mill | 36-3604 | Employee or transition unit |
| Camelback | 188 | Employee or transition unit |
How the Pipeline Validates Quality
Five overlapping checks run on every file:
1. Structure Analysis — AI reads the spreadsheet layout guided by a data dictionary with 100+ known header aliases and charge code abbreviations across Yardi, RealPage, RentManager, and AppFolio systems.
2. Extraction — Parallel chunked AI extraction with overlap regions to prevent Unit boundary splits. Deduplication uses composite keys (property + unit ID).
3. Spot Check — Independent AI audit of 8 sampled Units compares raw source cells against extracted values. Samples are biased toward edge cases (first rows, last rows, vacant Units, random fill). Scoring starts at 100; each value mismatch deducts 15 points, each formatting difference deducts 5. Multi-row Units are fully supported — the auditor sees all source rows for each Unit, not just the first.
4. Validation — Programmatic checks: missing unit IDs, duplicates, charge sum integrity, occupied-Unit rent sanity.
5. Comparison View — Interactive UI shows source → canonical field mapping for every Unit, with transformation badges.
Vendor Format Coverage
The test suite covers 8 property management system export formats commonly encountered in multifamily acquisitions:
| Format | Layout | Characteristics | Properties |
|---|---|---|---|
| Yardi Voyager | Horizontal | One row per Unit, aggregate charge columns | 3 |
| Yardi Lease Charges | Vertical | Charge code + amount sub-rows per Unit, merged cells | 4 |
| RealPage/Entrata | Vertical | 50–75 columns, multi-sheet with Report Parameters | 3 |
| RealPage/OneSite | Vertical | Resident IDs, status inference, charge sub-rows | 1 |
| RealPage One Line | Horizontal | 50+ individual charge columns, co-resident rows | 1 |
| RentManager | Vertical | 11K+ merge regions, full-text charge descriptions | 1 |
| AppFolio | Horizontal | Combined BD/BA field, minimal charge detail | 1 |
| Simple Horizontal | Horizontal | Standard one-row-per-Unit, basic columns | 2 |
Canonical Schema
Every file is normalized to the same 40-field schema regardless of source format. See the full Canonical Schema reference.
| Category | Fields |
|---|---|
| Identity | property_name, property_address, unit_id, floor_plan, sqft, beds, baths |
| Status | unit_status (Occupied, Vacant, Notice, Model, Down, Applicant) |
| Tenant | tenant_name |
| Lease | move_in, move_out, lease_start, lease_end, lease_term_months, is_mtm |
| Rent | market_rent |
| Charges (20) | base_rent, pet, parking, storage, utilities, trash, cable/internet, pest, amenity, W/D, package locker, insurance, deposit waiver, concession, MTM fee, employee discount, subsidy, CAM, admin, other |
| Financial | total_charges, deposit_required, deposit_on_hand, balance |
| Raw | raw_extra (preserves unmapped vendor data as JSON) |