Structure Analysis
After scanning identifies the Tables within each Sheet, the structure analysis phase performs a deep read to determine exactly how data is organized: which columns map to which fields, where the data starts and ends, and how charges are laid out.
The Analysis Call
Uses Claude Opus with extended thinking enabled (10,000 thinking tokens, 32,000 max output tokens). This is the most expensive LLM call in the pipeline, but it needs the deep reasoning capability to handle the wide variety of rent roll formats.
analyzeStructure(
sheets: Record<string, CellGrid>,
sheetMaps: Record<string, SheetMap>,
onProgress?: (pct: number, msg: string) => Promise<void>,
signal?: AbortSignal
): Promise<Record<string, unknown>>Prompt Construction
The prompt is built by buildStructurePrompt() and includes:
- Section classifications from the pre-scan (Tables identified per Sheet)
- Header/metadata rows in full
- First 30 + last 10 rows of the primary data section
- Preview rows from summary sections
- Fallback to
gridToText(grid, 50)if no SheetMap exists
Output Structure
{
data_sheet: string; // Sheet name with main rent roll
property_name: string; // Property/community name
report_date: string | null; // YYYY-MM-DD
header_row: number; // Column headers row number
data_start_row: number; // First Unit data row
data_end_row: number; // Last Unit data row
charge_orientation: "horizontal" | "vertical";
column_mapping: Record<string, string>; // Column letter → canonical field
charge_handling: string; // How charges are structured
rows_to_skip_pattern: string; // Non-data rows in data region
multi_row_per_unit: boolean; // Units span multiple rows?
multi_row_explanation: string; // How multi-row Units relate
notes: string; // Additional observations
}Column Mapping
The critical output. Maps Excel column letters to canonical schema field names:
{
"A": "unit_id",
"B": "floor_plan",
"C": "sqft",
"D": "tenant_name",
"E": "unit_status",
"F": "lease_start",
"G": "lease_end",
"H": "market_rent",
"I": "charge_base_rent"
}Charge Orientation
Two charge layouts are supported:
- Horizontal — one row per Unit, charge types are separate columns (most common)
- Vertical — each Unit spans multiple rows, with charge type in one column and amount in another
The orientation determines which extraction mode is used downstream.
Human-in-the-Loop Review
After analysis completes, the pipeline pauses at awaiting_review. The user can:
- Inspect the detected column mapping
- Override column assignments
- Adjust row boundaries (
data_start_row,data_end_row) - Change charge orientation
- Toggle
multi_row_per_unit
Overrides are submitted via POST /api/structure-confirm and merged into the structure result. Only 6 override keys are whitelisted: column_mapping, data_start_row, data_end_row, header_row, charge_orientation, multi_row_per_unit.
Streaming Progress Milestones
| Progress | Message |
|---|---|
| 20% | Starting analysis |
| 22% | Thinking block started (reading spreadsheet) |
| 25% | Thinking > 500 chars (looking at headers) |
| 28% | Thinking > 2000 chars (identifying pattern) |
| 30% | Thinking > 4000 chars (almost figured out) |
| 32% | Text block started (building map) |
| 35% | Text > 200 chars (locking in layout) |
| 38% | Complete |