The complete catalog of federal and local data sources powering RealHouse metro-level housing analytics.
RealHouse uses a two-tier data architecture. The federal backbone provides uniform, automatable coverage across all U.S. metros. The local systems layer provides higher-fidelity construction lifecycle data but must be built per-jurisdiction.
Federal sources give uniform national coverage; local sources add lifecycle granularity per metro.
Not every Houston-style housing metric can be reproduced in every metro. We classify each metric by how replicable it is using public data:
Metrics: Permits issued, house prices, employment, population, mortgage originations.
These metrics are available uniformly across all CBSAs because they come entirely from federal statistical programs with consistent geographic coverage. No local data required.
Metrics: Starts, under construction, completions, closings.
Requires permit inspection events and certificate-of-occupancy records from local permitting portals. Quality varies significantly by jurisdiction — some cities have robust open-data portals (Socrata, ArcGIS), while others are web-only with no bulk API.
Metrics: Finished vacant inventory, VDL (vacant developed lots), future lots, MLS resale stats, community-level data.
These metrics require field research, proprietary data feeds, or sophisticated modeling. No federal equivalent exists. Scaling these beyond a single metro demands significant per-jurisdiction investment.
These seven federal data sources form the backbone of RealHouse analytics. They provide uniform, automatable coverage and are the foundation for every metro profile.
Monthly permit authorizations (units, buildings, valuation) by CBSA, county, and place.
https://www2.census.gov/econ/bps/ (bulk
directory). CBSA files follow the pattern
cbsaYYMMc.txt.
packages/ingest/src/connectors/bps.py —
position-based parser for real Census fixed-width format.
National/regional starts, under construction, completions (NRC) and new-home sales/inventory (NRS).
https://www.census.gov/construction/nrc/ and
https://www.census.gov/construction/nrs/
Repeat-sales house price index by metro, state, and national level.
https://www.fhfa.gov/hpi/download/monthly/hpi_master.csv
packages/ingest/src/connectors/fhfa_hpi.py
Programmatic access to many housing and macro series, including BPS-derived permit series.
https://fred.stlouisfed.org/docs/api/fred/series_observations.html
packages/ingest/src/connectors/fred.py
Metro employment levels, unemployment rates, and job growth by sector.
packages/ingest/src/connectors/bls.py
Metro population totals and components of change (births/deaths, domestic migration, international migration).
packages/ingest/src/connectors/census_pop.py
Mortgage origination data — purchase loans as a proxy for financed closings.
https://ffiec.cfpb.gov/v2/data-browser-api/view/aggregations
packages/ingest/src/connectors/hmda.py — uses the
Data Browser API for aggregated queries.
Local data is where the real construction lifecycle visibility comes from — permit inspection events reveal starts, certificates of occupancy mark completions, and deed transfers capture closings. But this data is deeply fragmented: every city, county, and jurisdiction runs its own permitting system with its own portal, formats, and access policies. There is no federal equivalent for this granularity.
City and county permitting portals for the Houston metro — the first local data target.
ArcGIS REST endpoint + quarterly GIS downloads with rich parcel-level data.
Reusable connector patterns for scaling local data ingestion to other metros.
$limit /
$offset paging. Structured query language (SoQL).
resultOffset /
resultRecordCount paging. Supports spatial
queries.
This table maps every Houston-style housing KPI to its best public data source, replicability class, and implementation difficulty.
| Houston-Style KPI | Best Public Source(s) | Replicability | Difficulty | Notes |
|---|---|---|---|---|
| Permits issued | Census BPS | Class A | 1-2 | Backbone metric. CBSA/county/place. |
| Quarterly starts | BPS + NRC calibration (modeled) | Class A/B | 2-4 | Federal-only: permit-lag model. With local data: first inspection = start. |
| Under construction | Modeled from starts pipeline | Class A/B | 3-4 | Stock-flow identity: UC = UCprev + starts - completions |
| Completions | Modeled from starts pipeline | Class A/B | 3-4 | Federal-only: lagged starts. With local: CO/final inspection. |
| Finished vacant | Completions minus closings | Class B/C | 4-5 | Needs both completion AND sales signals. Most difficult. |
| Closings (financed) | HMDA purchase originations | Class A | 3 | Mortgage-only proxy. Annual cadence. |
| Closings (total) | Recorder deed transfers | Class C | 4-5 | Paywalled, per-county, normalization hard. |
| VDL / future lots | Plats + engineering permits | Class C | 5 | No federal equivalent. Requires local plat/subdivision data. |
| MLS resale stats | HAR (proprietary) | Class C | 5 | Not government data. No public API. |
| House prices | FHFA HPI | Class A | 1 | Repeat-sales index, good metro coverage. |
| Employment | BLS API | Class A | 2 | Monthly, good metro coverage. |
| Population | Census PEP | Class A | 2 | Annual, metro/county. |
For the complete analysis of data sources, automation strategies, and modeling approaches, see our research documents.