Skip to Content
PipelineValidation & Spot Check

Validation & Spot Check

After extraction, every Unit passes through validation checks and an independent LLM audit (spot check).

Validation Pipeline

validateRows(rows, expectedRowCount?) runs four categories of checks:

1. Missing Unit IDs

Row {N} (unit=?): missing unit_id → ERROR

Every Unit must have a unit_id. Missing IDs are errors.

2. Duplicate Unit IDs

Row {N} (unit={id}): duplicate unit_id '{id}' → WARNING

Duplicate unit_id values are flagged as warnings. The first occurrence is kept; duplicates are flagged.

3. Charge Sum Reconciliation

Row {N} (unit={id}): total_charges ({total}) != sum ({computed}) → WARNING

If a Unit has a total_charges field, it’s compared against the sum of all individual charge fields (charge_base_rent, charge_pet, charge_parking, etc.). Mismatches greater than $1 are warnings.

4. Zero-Rent Occupied Units

Row {N} (unit={id}): Occupied unit with zero base rent → WARNING

Units with status Occupied but no charge_base_rent (or zero) are flagged as warnings.

UnitBreakdown — Per-Unit Quality Tracking

The validation tracks errors and warnings per-Unit using Sets keyed by row index:

interface UnitBreakdown { successful: number; // No errors, no warnings flagged: number; // Warnings only failed: number; // At least one error total: number; }

At the end of validation, each Unit is classified:

  • Successful: appeared in neither unitErrors nor unitWarnings
  • Flagged: appeared in unitWarnings but not unitErrors
  • Failed: appeared in unitErrors

This breakdown is stored in the job record and rendered as the Accuracy Funnel in the UI.

Spot-Check System

When It Runs

Runs after every extraction. The spot-check audits a sample of extracted Units against the source spreadsheet, providing an independent verification layer.

Sampling Strategy

pickSampleUnits(rows, sampleSize=8) uses biased selection:

  1. 1 from first 5 rows — these often have edge cases (headers bleeding into data)
  2. 1 from last 5 rows — these often have summary rows bleeding into data
  3. 1 Vacant Unit — less data, more likely to have parsing issues
  4. Random fill — remaining slots filled randomly to reach the target sample size

Audit Process

For each sampled Unit:

  1. Find the source row(s) in the original grid (supports multi-row and vertical layouts — walks subsequent rows until a different unit ID or blank separator is found)
  2. Build a comparison payload with raw cell text and the extracted JSON for each unit
  3. Send all comparisons to Claude Sonnet in a single call, which independently audits each field
  4. Flag any field-level discrepancies

Confidence Scoring

score = 100 for each discrepancy: if severity === "error": score -= 15 if severity === "warning": score -= 5 score = max(0, score)

The confidence score (0–100) is stored in spot_check_confidence and displayed as a badge in the results.

LLM failure fallback: If the spot-check LLM call fails, the confidence defaults to 50 (not 100), and a discrepancy entry is recorded indicating the audit was inconclusive. This prevents a failed audit from being misinterpreted as a perfect score.

Discrepancy Format

interface SpotCheckDiscrepancy { unit_id: string; field: string; extracted: unknown; // Original extraction value expected: unknown; // What the LLM audit says it should be severity: "error" | "warning"; explanation: string; }

Output Writer

writeOutputBuffer(rows, validation?) generates a two-sheet XLSX:

Sheet 1: Rent Roll

All extracted Units with the canonical columns. Column headers match the schema field names exactly.

Sheet 2: Validation Report

Only included when validation data is provided. Uses three columns:

ColumnContent
CategorySummary, Spot Check, Spot Check Discrepancy, Error, or Warning
ItemDescription (e.g., “Unit Count”, the error/warning message, or discrepancy detail)
ValueThe value, score, or severity explanation

Includes summary stats (unit count, extraction method, error/warning counts), spot check confidence and discrepancies, then individual errors and warnings (capped at 50 warnings).

Last updated on