Sheet Scanning
The scanner is the first processing phase after upload. It reads every Sheet in the Excel file and breaks each one into classified sections (Tables).
Why Scanning Is Needed
Rent rolls are messy. A single Sheet might contain:
- A header block with property name and report date
- A unit data Table with tenant info and charges
- A summary Table with totals
- A charge code breakdown Table
- A floor plan summary
The scanner identifies these distinct regions so the structure analysis phase knows where to look for actual unit data.
The CellGrid Model
Every Excel Sheet is parsed into a CellGrid — a sparse cell map using "row,col" string keys (1-indexed):
interface CellGrid {
sheetName: string;
cells: Map<string, string>; // Key: "row,col", Value: cell text
rowCount: number;
colCount: number;
}Cell type handling during parsing:
- Dates: Converted to
YYYY-MM-DDISO format - Numbers: Integers stored as-is, decimals to 2 decimal places
- Booleans: Lowercase string (
"true"/"false") - Empty strings: Skipped entirely (sparse storage)
Section Classification
Each Sheet is classified into a SheetMap containing SheetSection entries:
interface SheetSection {
startRow: number; // 1-indexed inclusive
endRow: number; // 1-indexed inclusive
classification:
| "unit_data" // Individual unit records
| "summary_totals" // Aggregate totals/subtotals
| "charge_code_summary"// Charge code breakdowns
| "floor_plan_summary" // Floor plan aggregates
| "header_metadata" // Property name, report date, headers
| "other"; // Unclassified
hasVerticalCharges: boolean;
reason: string; // Human-readable explanation
}
interface SheetMap {
sheetName: string;
sections: SheetSection[];
primaryDataSection: SheetSection | null; // Largest unit_data section
}How It Works
readExcelBuffer(buffer, filename)— parses the XLSX into a dictionary ofCellGridobjects, keyed by Sheet namescanAllSheets(sheets, onProgress?, signal?)— scans all Sheets in parallel viaPromise.all- For each Sheet,
scanSheet(grid)sends the Sheet text to Claude Sonnet with a classification prompt - The LLM returns a JSON array of section descriptors
- The
primaryDataSectionis automatically selected as the largestunit_datasection by row span
Grid Text Serialization
Three functions convert grid regions to text for LLM prompts:
| Function | Behavior |
|---|---|
gridToText(grid, maxRows?) | Full grid dump. Format: R{n} | A: value | B: value | ... |
gridRangeToText(grid, startRow, endRow) | Specific row range only |
buildSheetText(grid) | Smart sampling: full dump for sheets ≤200 rows; first 30 + every 5th middle row + last 30 for larger sheets (capped at 30 columns) |
Progress Updates
- 16%: Starting scan
- 19%: Scan complete, reports section count (e.g., “Found 5 sections across 3 sheets”)
Last updated on