Monorepo layout, data flow, database schema, and the pipeline estimation model that powers metro-level metrics.
RealHouse is organized as a monorepo with separate packages for data ingestion and the web dashboard, plus shared database migrations and documentation.
realhouse/
├── packages/
│ ├── ingest/ # Python 3.12+
│ │ ├── pyproject.toml
│ │ ├── src/
│ │ │ ├── connectors/ # One module per data source
│ │ │ │ ├── bps.py # Census Building Permits Survey
│ │ │ │ ├── nrc_nrs.py # New Residential Construction/Sales
│ │ │ │ ├── fhfa_hpi.py # House Price Index
│ │ │ │ ├── fred.py # FRED API series
│ │ │ │ ├── bls.py # BLS employment/unemployment
│ │ │ │ ├── census_pop.py # Census PEP population
│ │ │ │ └── hmda.py # HMDA mortgage originations
│ │ │ ├── models/
│ │ │ │ └── permit_lag.py # Pipeline estimation model
│ │ │ ├── db.py # Supabase/Postgres connection
│ │ │ └── cli.py # Click CLI entry point
│ │ └── tests/
│ └── dashboard/ # Next.js 15 App Router
│ ├── package.json
│ ├── src/
│ │ ├── app/
│ │ │ ├── page.tsx # Landing / market selector
│ │ │ └── market/[cbsa]/
│ │ │ ├── page.tsx # Market overview (KPI tiles)
│ │ │ ├── permits/page.tsx # Permits detail page
│ │ │ ├── employment/page.tsx # BLS employment charts
│ │ │ ├── demographics/page.tsx # Census population charts
│ │ │ └── prices/page.tsx # HPI & mortgage rate charts
│ │ ├── components/
│ │ │ ├── charts/ # Recharts wrappers
│ │ │ └── kpi-card.tsx
│ │ └── lib/
│ │ ├── supabase.ts # Client setup
│ │ └── queries.ts # Data fetching
│ └── tailwind.config.ts
├── supabase/
│ ├── migrations/ # 9 SQL migrations
│ └── seed.sql # Houston CBSA + geography seed
├── docs/
│ ├── research/ # Primary research documents
│ └── plans/ # Design docs & implementation plans
├── explainer/ # This site
└── agent/ # Agent coordination files
Python CLI for fetching, parsing, and loading federal data. Uses
Click for commands, httpx for HTTP,
and psycopg for Postgres. Each connector is a module
in src/connectors/ — one file per federal data
source (BPS, NRC/NRS, FHFA HPI, FRED, BLS, Census PEP, HMDA).
Next.js 15 App Router with Tailwind CSS +
shadcn/ui components + Recharts for
charts. Queries Supabase directly from server components via the
queries.ts module.
Database schema defined as SQL migrations + seed data for the Houston CBSA geography (CBSA 26420, including all 9 counties). The 9 migrations create dimension tables, federal fact tables (including BLS employment, Census population, HMDA originations, and FRED series), and the derived mart.
Data moves through four layers: federal source files are parsed by Python connectors, loaded into Supabase tables, transformed by the pipeline model into a mart table, and finally queried by the Next.js dashboard.
The warehouse schema follows a dimensional model: geography dimensions, federal fact tables populated by connectors, and a derived mart table produced by the pipeline model.
These tables are designed but not yet implemented. They will be added as new data sources come online.
Harris County Appraisal District parcel data for Houston subdivision-level analysis.
Aggregated Houston permit counts from City of Houston open data portal.
Individual permit records from TPIA responses — application, issuance, final/CO dates.
Individual inspection events linked to permits — foundation, framing, final inspections.
NRC publishes starts, under construction, and completions only at national and Census region level — not at the CBSA level. To estimate metro-level pipeline metrics, we use a distributed lag convolution model calibrated to NRC totals.
Starts in month t are estimated by convolving permits (P) with a lag distribution (w). The typical permit-to-start lag for single-family construction is 1–3 months.
Completions are estimated by convolving starts (S) with a start-to-completion distribution (v). The typical start-to-completion duration for single-family is 6–9 months.
Under-construction stock follows a stock-flow identity. Each month's stock equals the previous month's stock plus new starts minus completions.
Finished vacant inventory equals the previous stock plus completions minus closings. Closings are proxied from HMDA purchase loan originations, which are now wired into the model as of Phase 2 (previously a placeholder).
The lag distributions w and v are adjusted so that national and regional aggregates of metro-level estimates match NRC published totals. For the Houston CBSA, Census South region factors are applied.
This calibration step ensures the model is internally consistent: summing all metro estimates within a region reproduces the NRC-reported regional totals.
Important implementation details discovered during development.
is_imputed flag. Imputed rows
include Census imputation for non-responding places; reported rows
include only places that actually responded.
sql.Identifier() for table and
column names. This approach was adopted after code review.