Sheet Scanning

The scanner is the first processing phase after upload. It reads every Sheet in the Excel file and breaks each one into classified sections (Tables).

Why Scanning Is Needed

Rent rolls are messy. A single Sheet might contain:

A header block with property name and report date
A unit data Table with tenant info and charges
A summary Table with totals
A charge code breakdown Table
A floor plan summary

The scanner identifies these distinct regions so the structure analysis phase knows where to look for actual unit data.

The CellGrid Model

Every Excel Sheet is parsed into a CellGrid — a sparse cell map using "row,col" string keys (1-indexed):


interface CellGrid {
  sheetName: string;
  cells: Map<string, string>;  // Key: "row,col", Value: cell text
  rowCount: number;
  colCount: number;
}

Cell type handling during parsing:

Dates: Converted to YYYY-MM-DD ISO format
Numbers: Integers stored as-is, decimals to 2 decimal places
Booleans: Lowercase string ("true" / "false")
Empty strings: Skipped entirely (sparse storage)

Section Classification

Each Sheet is classified into a SheetMap containing SheetSection entries:


interface SheetSection {
  startRow: number;        // 1-indexed inclusive
  endRow: number;          // 1-indexed inclusive
  classification:
    | "unit_data"          // Individual unit records
    | "summary_totals"     // Aggregate totals/subtotals
    | "charge_code_summary"// Charge code breakdowns
    | "floor_plan_summary" // Floor plan aggregates
    | "header_metadata"    // Property name, report date, headers
    | "other";             // Unclassified
  hasVerticalCharges: boolean;
  reason: string;          // Human-readable explanation
}
 
interface SheetMap {
  sheetName: string;
  sections: SheetSection[];
  primaryDataSection: SheetSection | null;  // Largest unit_data section
}

How It Works

readExcelBuffer(buffer, filename) — parses the XLSX into a dictionary of CellGrid objects, keyed by Sheet name
scanAllSheets(sheets, onProgress?, signal?) — scans all Sheets in parallel via Promise.all
For each Sheet, scanSheet(grid) sends the Sheet text to Claude Sonnet with a classification prompt
The LLM returns a JSON array of section descriptors
The primaryDataSection is automatically selected as the largest unit_data section by row span

Grid Text Serialization

Three functions convert grid regions to text for LLM prompts:

Function	Behavior
`gridToText(grid, maxRows?)`	Full grid dump. Format: `R{n} \| A: value \| B: value \| ...`
`gridRangeToText(grid, startRow, endRow)`	Specific row range only
`buildSheetText(grid)`	Smart sampling: full dump for sheets ≤200 rows; first 30 + every 5th middle row + last 30 for larger sheets (capped at 30 columns)

Progress Updates

16%: Starting scan
19%: Scan complete, reports section count (e.g., “Found 5 sections across 3 sheets”)