Technical Architecture

Monorepo layout, data flow, database schema, and the pipeline estimation model that powers metro-level metrics.

Monorepo Structure

RealHouse is organized as a monorepo with separate packages for data ingestion and the web dashboard, plus shared database migrations and documentation.

realhouse/
├── packages/
│   ├── ingest/                  # Python 3.12+
│   │   ├── pyproject.toml
│   │   ├── src/
│   │   │   ├── connectors/      # One module per data source
│   │   │   │   ├── bps.py       # Census Building Permits Survey
│   │   │   │   ├── nrc_nrs.py   # New Residential Construction/Sales
│   │   │   │   ├── fhfa_hpi.py  # House Price Index
│   │   │   │   ├── fred.py      # FRED API series
│   │   │   │   ├── bls.py       # BLS employment/unemployment
│   │   │   │   ├── census_pop.py # Census PEP population
│   │   │   │   └── hmda.py      # HMDA mortgage originations
│   │   │   ├── models/
│   │   │   │   └── permit_lag.py # Pipeline estimation model
│   │   │   ├── db.py            # Supabase/Postgres connection
│   │   │   └── cli.py           # Click CLI entry point
│   │   └── tests/
│   └── dashboard/               # Next.js 15 App Router
│       ├── package.json
│       ├── src/
│       │   ├── app/
│       │   │   ├── page.tsx             # Landing / market selector
│       │   │   └── market/[cbsa]/
│       │   │       ├── page.tsx         # Market overview (KPI tiles)
│       │   │       ├── permits/page.tsx # Permits detail page
│       │   │       ├── employment/page.tsx  # BLS employment charts
│       │   │       ├── demographics/page.tsx # Census population charts
│       │   │       └── prices/page.tsx  # HPI & mortgage rate charts
│       │   ├── components/
│       │   │   ├── charts/      # Recharts wrappers
│       │   │   └── kpi-card.tsx
│       │   └── lib/
│       │       ├── supabase.ts  # Client setup
│       │       └── queries.ts   # Data fetching
│       └── tailwind.config.ts
├── supabase/
│   ├── migrations/              # 9 SQL migrations
│   └── seed.sql                 # Houston CBSA + geography seed
├── docs/
│   ├── research/                # Primary research documents
│   └── plans/                   # Design docs & implementation plans
├── explainer/                   # This site
└── agent/                       # Agent coordination files

packages/ingest

Python CLI for fetching, parsing, and loading federal data. Uses Click for commands, httpx for HTTP, and psycopg for Postgres. Each connector is a module in src/connectors/ — one file per federal data source (BPS, NRC/NRS, FHFA HPI, FRED, BLS, Census PEP, HMDA).

packages/dashboard

Next.js 15 App Router with Tailwind CSS + shadcn/ui components + Recharts for charts. Queries Supabase directly from server components via the queries.ts module.

supabase/

Database schema defined as SQL migrations + seed data for the Houston CBSA geography (CBSA 26420, including all 9 counties). The 9 migrations create dimension tables, federal fact tables (including BLS employment, Census population, HMDA originations, and FRED series), and the derived mart.

Data Flow

Data moves through four layers: federal source files are parsed by Python connectors, loaded into Supabase tables, transformed by the pipeline model into a mart table, and finally queried by the Next.js dashboard.

BPS Files
HPI CSV
NRC/NRS XLSX
FRED / BLS / PEP / HMDA APIs
Python Connectors
bps.py, fhfa_hpi.py, nrc_nrs.py, fred.py, bls.py, census_pop.py, hmda.py
Supabase Tables
fact_bps_permits, fact_fhfa_hpi, fact_nrc_pipeline, fact_nrs_inventory, fact_bls_employment, fact_census_pop, fact_hmda_originations, fact_fred_series
Pipeline Model
permit_lag.py
Mart Table
mart_pipeline_quarterly
Next.js Dashboard
queries.ts → KPI Cards + Charts

Database Schema

The warehouse schema follows a dimensional model: geography dimensions, federal fact tables populated by connectors, and a derived mart table produced by the pipeline model.

Geography Dimension

dim_cbsa
  • cbsa_code TEXT PK
  • cbsa_name TEXT
  • delineation_year INT
  • is_metro BOOLEAN
bridge_cbsa_county
  • cbsa_code TEXT PK, FK
  • county_fips CHAR(5) PK
  • county_name TEXT
  • delineation_year INT

Federal Fact Tables

fact_bps_permits
  • cbsa_code TEXT PK, FK
  • survey_month DATE PK
  • units_1u INT
  • units_2u INT
  • units_3_4u INT
  • units_5plus INT
  • units_total INT
  • valuation_k_total NUMERIC
  • is_imputed BOOLEAN PK
  • loaded_at TIMESTAMPTZ
fact_fhfa_hpi
  • cbsa_code TEXT PK
  • ref_period DATE PK
  • hpi_value NUMERIC
  • hpi_pct_change_1yr NUMERIC
  • index_type TEXT PK
  • loaded_at TIMESTAMPTZ
fact_nrc_pipeline
  • ref_month DATE PK
  • geo_level TEXT PK
  • geo_code TEXT PK
  • metric TEXT PK
  • units_saar NUMERIC
  • units_nsa NUMERIC
  • source_table TEXT
  • loaded_at TIMESTAMPTZ
fact_nrs_inventory
  • ref_month DATE PK
  • geo_level TEXT PK
  • geo_code TEXT PK
  • metric TEXT PK
  • value NUMERIC
  • source_table TEXT
  • loaded_at TIMESTAMPTZ
fact_bls_employment
  • cbsa_code TEXT PK, FK
  • ref_month DATE PK
  • series_id TEXT PK
  • value NUMERIC
  • loaded_at TIMESTAMPTZ
fact_census_pop
  • cbsa_code TEXT PK, FK
  • vintage_year INT PK
  • population BIGINT
  • loaded_at TIMESTAMPTZ
fact_hmda_originations
  • msa_md_code TEXT PK
  • activity_year INT PK
  • loan_count BIGINT
  • loan_amount_k NUMERIC
  • loaded_at TIMESTAMPTZ
fact_fred_series
  • series_id TEXT PK
  • obs_date DATE PK
  • value NUMERIC
  • loaded_at TIMESTAMPTZ

Derived Mart

mart_pipeline_quarterly
  • cbsa_code TEXT PK, FK
  • quarter DATE PK
  • permits_qtr INT
  • starts_qtr_modeled NUMERIC
  • completions_qtr_modeled NUMERIC
  • uc_stock_modeled NUMERIC
  • finished_vacant_modeled NUMERIC
  • closings_financed_proxy BIGINT
  • model_version TEXT PK
  • loaded_at TIMESTAMPTZ
Future / Planned Tables

These tables are designed but not yet implemented. They will be added as new data sources come online.

Local Data Tables

fact_hcad_parcels Future

Harris County Appraisal District parcel data for Houston subdivision-level analysis.

fact_houston_permits_agg Future

Aggregated Houston permit counts from City of Houston open data portal.

TPIA (Event-Sourced) Tables

fact_permit_record Future

Individual permit records from TPIA responses — application, issuance, final/CO dates.

fact_inspection_event Future

Individual inspection events linked to permits — foundation, framing, final inspections.

Permit-to-Pipeline Estimation Model

NRC publishes starts, under construction, and completions only at national and Census region level — not at the CBSA level. To estimate metro-level pipeline metrics, we use a distributed lag convolution model calibrated to NRC totals.

1. Starts from Permits

St = Σk wk · Pt−k

Starts in month t are estimated by convolving permits (P) with a lag distribution (w). The typical permit-to-start lag for single-family construction is 1–3 months.

2. Completions from Starts

Ct = Σj vj · St−j

Completions are estimated by convolving starts (S) with a start-to-completion distribution (v). The typical start-to-completion duration for single-family is 6–9 months.

3. Under Construction (Stock-Flow Identity)

UCt = UCt−1 + StCt

Under-construction stock follows a stock-flow identity. Each month's stock equals the previous month's stock plus new starts minus completions.

4. Finished Vacant Inventory

FVt = FVt−1 + Ctclosingst

Finished vacant inventory equals the previous stock plus completions minus closings. Closings are proxied from HMDA purchase loan originations, which are now wired into the model as of Phase 2 (previously a placeholder).

5. Calibration to NRC Totals

The lag distributions w and v are adjusted so that national and regional aggregates of metro-level estimates match NRC published totals. For the Houston CBSA, Census South region factors are applied.

This calibration step ensures the model is internally consistent: summing all metro estimates within a region reproduces the NRC-reported regional totals.

Known Limitations

Key Technical Decisions

Important implementation details discovered during development.