Structure Analysis

After scanning identifies the Tables within each Sheet, the structure analysis phase performs a deep read to determine exactly how data is organized: which columns map to which fields, where the data starts and ends, and how charges are laid out.

The Analysis Call

Uses Claude Opus with extended thinking enabled (10,000 thinking tokens, 32,000 max output tokens). This is the most expensive LLM call in the pipeline, but it needs the deep reasoning capability to handle the wide variety of rent roll formats.


analyzeStructure(
  sheets: Record<string, CellGrid>,
  sheetMaps: Record<string, SheetMap>,
  onProgress?: (pct: number, msg: string) => Promise<void>,
  signal?: AbortSignal
): Promise<Record<string, unknown>>

Prompt Construction

The prompt is built by buildStructurePrompt() and includes:

Section classifications from the pre-scan (Tables identified per Sheet)
Header/metadata rows in full
First 30 + last 10 rows of the primary data section
Preview rows from summary sections
Fallback to gridToText(grid, 50) if no SheetMap exists

Output Structure


{
  data_sheet: string;             // Sheet name with main rent roll
  property_name: string;          // Property/community name
  report_date: string | null;     // YYYY-MM-DD
  header_row: number;             // Column headers row number
  data_start_row: number;         // First Unit data row
  data_end_row: number;           // Last Unit data row
  charge_orientation: "horizontal" | "vertical";
  column_mapping: Record<string, string>;  // Column letter → canonical field
  charge_handling: string;        // How charges are structured
  rows_to_skip_pattern: string;   // Non-data rows in data region
  multi_row_per_unit: boolean;    // Units span multiple rows?
  multi_row_explanation: string;  // How multi-row Units relate
  notes: string;                  // Additional observations
}

Column Mapping

The critical output. Maps Excel column letters to canonical schema field names:


{
  "A": "unit_id",
  "B": "floor_plan",
  "C": "sqft",
  "D": "tenant_name",
  "E": "unit_status",
  "F": "lease_start",
  "G": "lease_end",
  "H": "market_rent",
  "I": "charge_base_rent"
}

Charge Orientation

Two charge layouts are supported:

Horizontal — one row per Unit, charge types are separate columns (most common)
Vertical — each Unit spans multiple rows, with charge type in one column and amount in another

The orientation determines which extraction mode is used downstream.

Human-in-the-Loop Review

After analysis completes, the pipeline pauses at awaiting_review. The user can:

Inspect the detected column mapping
Override column assignments
Adjust row boundaries (data_start_row, data_end_row)
Change charge orientation
Toggle multi_row_per_unit

Overrides are submitted via POST /api/structure-confirm and merged into the structure result. Only 6 override keys are whitelisted: column_mapping, data_start_row, data_end_row, header_row, charge_orientation, multi_row_per_unit.

Streaming Progress Milestones

Progress	Message
20%	Starting analysis
22%	Thinking block started (reading spreadsheet)
25%	Thinking > 500 chars (looking at headers)
28%	Thinking > 2000 chars (identifying pattern)
30%	Thinking > 4000 chars (almost figured out)
32%	Text block started (building map)
35%	Text > 200 chars (locking in layout)
38%	Complete