Skip to Content
PipelineStructure Analysis

Structure Analysis

After scanning identifies the Tables within each Sheet, the structure analysis phase performs a deep read to determine exactly how data is organized: which columns map to which fields, where the data starts and ends, and how charges are laid out.

The Analysis Call

Uses Claude Opus with extended thinking enabled (10,000 thinking tokens, 32,000 max output tokens). This is the most expensive LLM call in the pipeline, but it needs the deep reasoning capability to handle the wide variety of rent roll formats.

analyzeStructure( sheets: Record<string, CellGrid>, sheetMaps: Record<string, SheetMap>, onProgress?: (pct: number, msg: string) => Promise<void>, signal?: AbortSignal ): Promise<Record<string, unknown>>

Prompt Construction

The prompt is built by buildStructurePrompt() and includes:

  • Section classifications from the pre-scan (Tables identified per Sheet)
  • Header/metadata rows in full
  • First 30 + last 10 rows of the primary data section
  • Preview rows from summary sections
  • Fallback to gridToText(grid, 50) if no SheetMap exists

Output Structure

{ data_sheet: string; // Sheet name with main rent roll property_name: string; // Property/community name report_date: string | null; // YYYY-MM-DD header_row: number; // Column headers row number data_start_row: number; // First Unit data row data_end_row: number; // Last Unit data row charge_orientation: "horizontal" | "vertical"; column_mapping: Record<string, string>; // Column letter → canonical field charge_handling: string; // How charges are structured rows_to_skip_pattern: string; // Non-data rows in data region multi_row_per_unit: boolean; // Units span multiple rows? multi_row_explanation: string; // How multi-row Units relate notes: string; // Additional observations }

Column Mapping

The critical output. Maps Excel column letters to canonical schema field names:

{ "A": "unit_id", "B": "floor_plan", "C": "sqft", "D": "tenant_name", "E": "unit_status", "F": "lease_start", "G": "lease_end", "H": "market_rent", "I": "charge_base_rent" }

Charge Orientation

Two charge layouts are supported:

  • Horizontal — one row per Unit, charge types are separate columns (most common)
  • Vertical — each Unit spans multiple rows, with charge type in one column and amount in another

The orientation determines which extraction mode is used downstream.

Human-in-the-Loop Review

After analysis completes, the pipeline pauses at awaiting_review. The user can:

  • Inspect the detected column mapping
  • Override column assignments
  • Adjust row boundaries (data_start_row, data_end_row)
  • Change charge orientation
  • Toggle multi_row_per_unit

Overrides are submitted via POST /api/structure-confirm and merged into the structure result. Only 6 override keys are whitelisted: column_mapping, data_start_row, data_end_row, header_row, charge_orientation, multi_row_per_unit.

Streaming Progress Milestones

ProgressMessage
20%Starting analysis
22%Thinking block started (reading spreadsheet)
25%Thinking > 500 chars (looking at headers)
28%Thinking > 2000 chars (identifying pattern)
30%Thinking > 4000 chars (almost figured out)
32%Text block started (building map)
35%Text > 200 chars (locking in layout)
38%Complete
Last updated on