Hooks
The frontend uses three custom hooks to manage the ETL pipeline lifecycle, real-time progress tracking, and a client-side demo simulation.
useEtlPipeline
File: hooks/useEtlPipeline.ts
Orchestrates the full ETL pipeline from the client side. Manages file upload, structure analysis, user review, and data extraction as a linear state machine with cancellation support.
State Machine
idle --> uploading --> analyzing --> awaiting_review --> extracting --> complete
| | | | |
+----------+-------------+--------------+------------------+--> failed
| | | | |
+----------+-------------+--------------+------------------+--> cancelled| State | Description |
|---|---|
idle | No pipeline running. Initial state. |
uploading | File is being sent to /api/upload via FormData |
analyzing | Structure analysis in progress via /api/analyze |
awaiting_review | Pipeline paused. Structure result is available for user review. The user must call confirmStructure() to continue. |
extracting | Data extraction in progress via /api/extract |
complete | Pipeline finished successfully |
failed | An error occurred at any stage |
cancelled | User cancelled via the cancel() method |
Return Type: PipelineResult
type FailedStep = "upload" | "analyze" | "extract" | "confirm" | null;
interface PipelineResult {
state: PipelineState;
jobId: string | null;
error: string | null;
failedStep: FailedStep;
structureResult: StructureResult | null;
start: (file: File) => Promise<void>;
confirmStructure: (overrides?: Record<string, unknown>) => Promise<void>;
cancel: () => void;
reset: () => void;
}StructureResult
interface StructureResult {
column_mapping?: Record<string, string>;
column_headers?: Record<string, string>;
header_row?: number;
data_start_row?: number;
data_end_row?: number;
charge_orientation?: string;
multi_row_per_unit?: boolean;
property_name?: string;
report_date?: string;
notes?: string;
}API Calls
| Method | Endpoint | When | Body |
|---|---|---|---|
start() | POST /api/upload | Uploading state | FormData with the file |
start() | POST /api/analyze | Analyzing state | { job_id } |
confirmStructure() | POST /api/structure-confirm | After review | { job_id, structure_overrides? } |
confirmStructure() / start() | POST /api/extract | Extracting state | { job_id } |
All requests include auth headers from getAuthHeaders() and support AbortController cancellation.
Key Behaviors
- Review checkpoint: After analysis, if
structure_resultexists in the database, the pipeline pauses atawaiting_review. If no structure result is found, it skips directly to extraction. - Cancellation: Each phase (analyze, extract) gets its own
AbortController, which is cleared at phase boundaries. Thecancel()method aborts the current controller, nulls the ref, and updates the Supabase job row tostatus: 'failed'withstage_message: 'Cancelled by user'. - Error extraction: Handles non-JSON error responses from the server (e.g., Vercel 504 timeouts return HTML). Falls back to generic messages for 504 and 502 status codes.
- Reset: Aborts any in-flight request, clears all state, and returns to
idle.
useJobProgress
File: hooks/useJobProgress.ts
Subscribes to real-time updates for a specific ETL job via Supabase Realtime. Maintains a rolling list of milestone messages for the timeline UI.
Usage
const progress = useJobProgress(jobId);
// progress.status, progress.progress_pct, progress.milestones, etc.JobProgress Interface
interface JobProgress {
status: string;
progress_pct: number;
stage_message: string | null;
property_name: string | null;
report_date: string | null;
unit_count: number | null;
error_count: number;
warning_count: number;
errors: string[];
warnings: string[];
output_storage_path: string | null;
original_filename: string | null;
milestones: Milestone[];
total_rows: number | null;
total_chunks: number | null;
file_size_bytes: number | null;
extraction_method: string | null;
spot_check_confidence: number | null;
spot_check_discrepancies: SpotCheckDiscrepancy[];
unit_breakdown: UnitBreakdown | null;
expected_data_rows: number | null;
}Milestone
interface Milestone {
message: string;
pct: number;
timestamp: number; // Date.now() when received
}Milestones are accumulated client-side in a ref. Each new stage_message from the database becomes a milestone entry. Duplicate consecutive messages are deduplicated.
SpotCheckDiscrepancy
interface SpotCheckDiscrepancy {
unit_id: string;
field: string;
extracted: unknown;
expected: unknown;
severity: "error" | "warning";
explanation: string;
}UnitBreakdown
interface UnitBreakdown {
successful: number;
flagged: number;
failed: number;
total: number;
}Supabase Realtime Subscription
On mount (when jobId is non-null), the hook:
- Fetches initial state — queries
etl_jobsfor all tracked columns usingsupabase.from('etl_jobs').select(...). - Subscribes to changes — opens a Supabase Realtime channel named
etl_job_{jobId}listening forpostgres_changesUPDATE events on theetl_jobstable, filtered byid=eq.{jobId}. - Applies updates — each payload is mapped into the
JobProgressshape and a new milestone is appended if thestage_messagechanged. - Cleans up — on unmount or
jobIdchange, removes the channel viasupabase.removeChannel(channel).
Default State
When no jobId is provided (or on reset), the hook returns DEFAULT_PROGRESS:
const DEFAULT_PROGRESS: JobProgress = {
status: "pending",
progress_pct: 0,
stage_message: null,
property_name: null,
report_date: null,
unit_count: null,
error_count: 0,
warning_count: 0,
errors: [],
warnings: [],
output_storage_path: null,
original_filename: null,
milestones: [],
total_rows: null,
total_chunks: null,
file_size_bytes: null,
extraction_method: null,
spot_check_confidence: null,
spot_check_discrepancies: [],
unit_breakdown: null,
expected_data_rows: null,
};useDemoMode
File: hooks/useDemoMode.ts
Client-side simulation of the full ETL pipeline. Makes zero API calls — runs entirely from hardcoded step data and timers. Used on the public landing page to demonstrate the pipeline experience.
See the Demo Mode page for full documentation.
Return Type
{
active: boolean; // Whether the demo is currently running
state: string; // Current simulated pipeline state
progress: JobProgress; // Full JobProgress object (same shape as real pipeline)
start: () => void; // Begin the demo simulation
reset: () => void; // Stop and reset to idle
}