Skip to Content
Test Results

Validation Results

Pipeline validation across 16 properties and 8 vendor formats.


At a Glance

Properties tested16
Vendor formats8 distinct PMS exports
Total Units processed3,732
Extraction errors0
Spot check confidence100/100 on every file

Zero extraction errors across 3,732 Units spanning 8 different property management system formats. Every file scored 100/100 on the independent spot check audit.


Test Matrix

Each file was processed through the full pipeline: Upload, Scan, Structure Analysis (Claude Opus), Confirmation, Extraction (Claude Sonnet, parallel chunks), Spot Check (Claude Sonnet audit), and Validation.

Horizontal Formats

Standard one-row-per-Unit layouts where every field is directly visible in the source.

PropertyPMS FormatUnitsErrorsSpot Check
Tuscany ParkYardi Voyager3920100/100
Stonecrest MillYardi Voyager2800100/100
DoriaYardi Voyager1600100/100
Stevens CreekSimple horizontal2560100/100
Magnolia VillasSimple horizontal1440100/100
Poinsett ViewAppFolio1110100/100
RUN One LineRealPage (50 cols)640100/100

Vertical / Multi-Row Formats

Layouts where each Unit spans multiple rows (charge code + amount on separate sub-rows). The pipeline aggregates charges across sub-rows automatically.

PropertyPMS FormatUnitsErrorsSpot Check
CamelbackRealPage/OneSite3340100/100
3000 at Med CenterRealPage/Entrata3240100/100
GraysonRentManager3220100/100
The HudsonYardi (vertical)3200100/100
Allure at EdinburghYardi (vertical)2800100/100
Tides ESRealPage/Entrata1680100/100
Woodland MewsYardi (vertical)2330100/100
Elm CreekYardi (vertical)1680100/100
Adams SquareRealPage/Entrata1060100/100

Totals

Total Units3,732
Total errors0
Total warnings15 (all informational — see below)
Spot check100/100 on all 16 files

Download Test Files

For each property, download the original rent roll and the standardized 40-field output. Each output includes a “Rent Roll” sheet (extracted data) and a “Validation Report” sheet (spot check results, warnings, and extraction metadata).

Loading test files...

Warning Details

All 15 warnings are informational flags for human review — none indicate data integrity failures or extraction errors. Every warning falls into one of three categories:

Row Count Mismatch (10 warnings)

On vertical-format files, each Unit spans multiple source rows. The pipeline correctly aggregates sub-rows into one Unit record. The warning simply notes the difference between source row count and extracted Unit count.

PropertySource RowsExtracted UnitsAvg Rows/Unit
Grayson2,2293226.9
The Hudson1,9303206.0
3000 at Med Center1,9233245.9
Camelback1,6873345.1
RUN One Line1,5776424.6
Allure at Edinburgh1,2532804.5
Woodland Mews1,3902336.0
Elm Creek6501683.9
Tides ES5801683.5
Adams Square3451063.3

Charge Sum Rounding (3 warnings)

When aggregating individual charges into total_charges, sub-penny amounts occasionally produce rounding differences of a few dollars. The individual charge values are correct.

PropertyUnitDifference
3000 at Med Center304$5.00
3000 at Med Center406$5.25
Allure at Edinburgh1219$7.00

Occupied with Zero Base Rent (2 warnings)

These are real data conditions in the source file — not extraction errors. Common for employee units, units in transition, or units with non-standard lease structures.

PropertyUnitReason
Stonecrest Mill36-3604Employee or transition unit
Camelback188Employee or transition unit

How the Pipeline Validates Quality

Five overlapping checks run on every file:

1. Structure Analysis — AI reads the spreadsheet layout guided by a data dictionary with 100+ known header aliases and charge code abbreviations across Yardi, RealPage, RentManager, and AppFolio systems.

2. Extraction — Parallel chunked AI extraction with overlap regions to prevent Unit boundary splits. Deduplication uses composite keys (property + unit ID).

3. Spot Check — Independent AI audit of 8 sampled Units compares raw source cells against extracted values. Samples are biased toward edge cases (first rows, last rows, vacant Units, random fill). Scoring starts at 100; each value mismatch deducts 15 points, each formatting difference deducts 5. Multi-row Units are fully supported — the auditor sees all source rows for each Unit, not just the first.

4. Validation — Programmatic checks: missing unit IDs, duplicates, charge sum integrity, occupied-Unit rent sanity.

5. Comparison View — Interactive UI shows source → canonical field mapping for every Unit, with transformation badges.


Vendor Format Coverage

The test suite covers 8 property management system export formats commonly encountered in multifamily acquisitions:

FormatLayoutCharacteristicsProperties
Yardi VoyagerHorizontalOne row per Unit, aggregate charge columns3
Yardi Lease ChargesVerticalCharge code + amount sub-rows per Unit, merged cells4
RealPage/EntrataVertical50–75 columns, multi-sheet with Report Parameters3
RealPage/OneSiteVerticalResident IDs, status inference, charge sub-rows1
RealPage One LineHorizontal50+ individual charge columns, co-resident rows1
RentManagerVertical11K+ merge regions, full-text charge descriptions1
AppFolioHorizontalCombined BD/BA field, minimal charge detail1
Simple HorizontalHorizontalStandard one-row-per-Unit, basic columns2

Canonical Schema

Every file is normalized to the same 40-field schema regardless of source format. See the full Canonical Schema reference.

CategoryFields
Identityproperty_name, property_address, unit_id, floor_plan, sqft, beds, baths
Statusunit_status (Occupied, Vacant, Notice, Model, Down, Applicant)
Tenanttenant_name
Leasemove_in, move_out, lease_start, lease_end, lease_term_months, is_mtm
Rentmarket_rent
Charges (20)base_rent, pet, parking, storage, utilities, trash, cable/internet, pest, amenity, W/D, package locker, insurance, deposit waiver, concession, MTM fee, employee discount, subsidy, CAM, admin, other
Financialtotal_charges, deposit_required, deposit_on_hand, balance
Rawraw_extra (preserves unmapped vendor data as JSON)
Last updated on