RollFormat
RollFormat converts messy, non-standardized rent roll Excel files into a clean, canonical format. Drop in any rent roll XLSX and get back a standardized spreadsheet — same 40-field format every time, regardless of the source.
How It Works
The pipeline follows a consistent data flow:
File → Sheets → Tables → Units → Clean Output
- Sheet — a tab in the Excel file. Most rent rolls have 1–3.
- Table — a contiguous block of rows within a Sheet. One Sheet often has several: unit data, summaries, metadata.
- Unit — a single apartment or space extracted from a Table. Can be one row or multiple rows depending on layout.
Pipeline Stages
- Upload — User uploads an
.xlsxor.xlsfile (max 10 MB) - Scan — Every Sheet is scanned in parallel to find Tables and classify them
- Analyze — Deep structure analysis: column mapping, row boundaries, charge layout detection
- Review — User reviews and optionally overrides the detected structure
- Extract — Units are extracted via LLM (Claude Sonnet, parallel chunks)
- Validate — Per-Unit quality checks: missing IDs, duplicates, charge mismatches
- Output — Clean Output XLSX with 40+ canonical fields per Unit, plus a validation report sheet
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 16 (App Router), React 19, Mantine 8 |
| Backend | Next.js API Routes (serverless) |
| Database | Supabase (PostgreSQL + Realtime) |
| Storage | Supabase Storage |
| AI | Anthropic Claude (Opus for structure, Sonnet for extraction) |
| Parsing | XLSX.js |
| Hosting | Vercel |
Quick Links
- Architecture — System design, module inventory, data flow
- Pipeline — Detailed breakdown of each processing phase
- Canonical Schema — All 40+ output fields documented
- API Reference — Every endpoint with request/response formats
- Database — Schema, migrations, realtime setup
- Frontend — Hooks, components, demo mode
Last updated on