POST /api/extract
The heavy lifter. Downloads the uploaded File, parses it, extracts every Unit row using the confirmed structure, validates the output, and writes a standardized 2-Sheet XLSX Output File to storage.
Request
{
"job_id": "uuid-string"
}Headers: Authorization: Bearer <token>
Max duration: 300 seconds (5 minutes — Vercel serverless timeout)
Prerequisites
The job must have a structure_result (i.e., /api/analyze must have completed). If the structure has not been analyzed, the endpoint returns a 400 error prompting you to run /api/analyze first.
Processing Steps
- Authenticate via
checkAuth()and fetch the full job row - Record start — set
extract_started_at, status toextracting, progress to 45% - Download File from Supabase Storage using the job’s
storage_path - Parse Excel into
CellGridobjects (one per Sheet) - Extract data — data is chunked and extracted in parallel via Claude Sonnet (LLM extraction). Each chunk is up to 60 rows, with up to 8 chunks processed concurrently.
- Spot-check — a sample of extracted Units is audited against the source spreadsheet via an independent LLM call. Produces a
confidencescore (0–100) and a list ofdiscrepancies. - Validate — run
validateRows()across all extracted rows, checking for missing required fields, out-of-range values, and row-count expectations derived fromdata_start_row/data_end_row. Any chunk extraction failures from step 5 are surfaced as warnings in this validation report rather than being silently dropped. - Write output — generate a 2-Sheet XLSX (
Rent Roll+Validation) viawriteOutputBuffer() - Upload output to Supabase Storage at
outputs/{job_id}/output.xlsx - Mark complete — compute total duration, store all results, set status to
complete
Progress Updates
| % | Stage Message |
|---|---|
| 45 | Loading the File… |
| 50–90 | Extraction progress (updated per chunk) |
| 91 | Spot-checking a sample of Units… |
| 92 | Checking N Units look right… |
| 95 | Building your clean File… |
| 100 | All done! |
During step 5, the extraction callback fires incremental progress updates between 50% and 90%. If the extraction message contains a section/chunk count (e.g., “Processing 4 sections”), the total_chunks field is also written to the job row.
Job Fields Updated
| Field | Type | Description |
|---|---|---|
status | string | extracting during processing, complete on success, failed on error |
progress_pct | number | 45 through 100 |
stage_message | string | Human-readable progress label |
extract_started_at | timestamp | When extraction began |
extract_finished_at | timestamp | When extraction completed |
total_duration_ms | number | Sum of analyze duration + extract duration (excludes the review/confirmation pause between the two steps) |
unit_count | number | Total extracted Units |
error_count | number | Validation errors |
warning_count | number | Validation warnings |
errors | string[] | List of validation error messages |
warnings | string[] | List of validation warning messages |
output_storage_path | string | Storage path to the Output File |
total_chunks | number | Number of extraction sections (if reported) |
Enriched structure_result Fields
At completion, the following metadata fields are merged into the job’s structure_result JSON object (not stored as top-level job columns):
| Key | Type | Description |
|---|---|---|
_extraction_method | string | Always "llm" |
_spot_check_confidence | number | null | 0–100 confidence score from spot-check audit |
_spot_check_discrepancies | object[] | Discrepancies found during spot-check |
_unit_breakdown | object | Per-unit-type counts from validation |
Response
Success (200):
{
"job_id": "uuid-string",
"unit_count": 312,
"error_count": 0,
"warning_count": 2,
"extraction_method": "llm"
}Error Sanitization
Raw error messages are sanitized before returning to the client:
| Condition | User-facing Message |
|---|---|
| Credit balance issues | AI service temporarily unavailable. Please try again later. |
| Rate limits | Too many requests. Please wait a moment and try again. |
| Overloaded | AI service is busy. Please try again in a minute. |
| Other | Something went wrong. Please try again. |
On failure, the job status is set to failed with the sanitized message.
Error (400):
{
"error": "Run /api/analyze first"
}Error (500):
{
"error": "Something went wrong. Please try again."
}Abort Handling
If the client aborts the request (e.g., the user navigates away), the route catches the AbortError, and returns a 499 response:
{
"error": "Cancelled"
}Unlike /api/analyze, the abort handler here does not update the job status — the job remains in extracting state. A subsequent call to /api/extract will re-run extraction from scratch.
Spot-Check Details
After extraction, a random sample of extracted Units is audited against the source spreadsheet via an independent LLM call. The result includes:
spot_check_confidence— a score from 0 to 100 indicating how well the extraction matches an independent LLM re-reading of the same source cells. Scores below 50 automatically generate a warning. On LLM failure, confidence defaults to 50 (not 100), with a discrepancy entry indicating the audit was inconclusive.spot_check_discrepancies— an array of objects describing each field where the extraction and the independent audit disagree.