The complete catalog of federal and local data sources powering RealHouse metro-level housing analytics.
RealHouse uses a two-tier data architecture. The federal backbone provides uniform, automatable coverage across all U.S. metros. The local systems layer provides higher-fidelity construction lifecycle data but must be built per-jurisdiction.
Federal sources give uniform national coverage; local sources add lifecycle granularity per metro.
Not every Houston-style housing metric can be reproduced in every metro. We classify each metric by how replicable it is using public data:
Metrics: Permits issued, house prices, employment, population, mortgage originations.
These metrics are available uniformly across all CBSAs because they come entirely from federal statistical programs with consistent geographic coverage. No local data required.
Metrics: Starts, under construction, completions, closings.
Requires permit inspection events and certificate-of-occupancy records from local permitting portals. Quality varies significantly by jurisdiction — some cities have robust open-data portals (Socrata, ArcGIS), while others are web-only with no bulk API.
Metrics: Finished vacant inventory, VDL (vacant developed lots), future lots, MLS resale stats, community-level data.
These metrics require field research, proprietary data feeds, or sophisticated modeling. No federal equivalent exists. Scaling these beyond a single metro demands significant per-jurisdiction investment.
These seven federal data sources form the backbone of RealHouse analytics. They provide uniform, automatable coverage and are the foundation for every metro profile.
Monthly permit authorizations (units, buildings, valuation) by CBSA, county, and place.
https://www2.census.gov/econ/bps/ (bulk
directory). CBSA files follow the pattern
cbsaYYMMc.txt.
packages/ingest/src/connectors/bps.py —
position-based parser for real Census fixed-width format.
National/regional starts, under construction, completions (NRC) and new-home sales/inventory (NRS).
https://www.census.gov/construction/nrc/ and
https://www.census.gov/construction/nrs/
Repeat-sales house price index by metro, state, and national level.
https://www.fhfa.gov/hpi/download/monthly/hpi_master.csv
packages/ingest/src/connectors/fhfa_hpi.py
Programmatic access to many housing and macro series, including BPS-derived permit series.
https://fred.stlouisfed.org/docs/api/fred/series_observations.html
packages/ingest/src/connectors/fred.py
Metro employment levels, unemployment rates, and job growth by sector.
packages/ingest/src/connectors/bls.py
Metro population totals and components of change (births/deaths, domestic migration, international migration).
packages/ingest/src/connectors/census_pop.py
Mortgage origination data — purchase loans as a proxy for financed closings.
https://ffiec.cfpb.gov/v2/data-browser-api/view/aggregations
packages/ingest/src/connectors/hmda.py — uses the
Data Browser API for aggregated queries.
Local data is where the real construction lifecycle visibility comes from — permit inspection events reveal starts, certificates of occupancy mark completions, and deed transfers capture closings. But this data is deeply fragmented: every city, county, and jurisdiction runs its own permitting system with its own portal, formats, and access policies. There is no federal equivalent for this granularity.
Houston (CBSA 26420) is our first local-data metro. Here's a complete inventory of what we've tapped, what's documented and waiting, and what remains out of reach.
Monthly single-family and multifamily permit counts from the City of Houston open-data portal, back to 2004.
packages/ingest/src/connectors/houston/permits_agg.py
— HTTP download + openpyxl parser. 252 rows loaded into
Supabase.
Residential parcel inventory via ArcGIS REST — property values, lot sizes, ownership, and state classification for 500K+ parcels.
gis.hctx.net.
state_class (letter codes A1, B1, etc.), NOT
land_use (which has numeric codes like 1001).
packages/ingest/src/connectors/houston/hcad.py —
ArcGIS REST connector with paginated fetch and UTC-aware date
conversion. Bulk load of 500K+ parcels pending (connector
works, but remote insert throughput is the bottleneck).
Individual permit records with full lifecycle dates — from application through inspections to certificate of occupancy. This is the critical missing source for computing actual starts, completions, and under-construction stock.
fact_permit_record (individual permits) and
fact_inspection_event (inspection milestones).
Schema already defined in migration 012.
packages/ingest/src/connectors/houston/tpia.py —
stub connector with field mapping documentation, ready for
data arrival.
Excel reports from the Planning & Development Department showing subdivision approvals, lot counts, and entitlement pipeline activity.
Web-based permit search at houstonpermittingcenter.org — individual permit records with status, but no public API.
Web portal for unincorporated Harris County permits — JavaScript-gated login, no documented API.
Real property deed transfers — the most accurate closings signal, but account-gated and paywalled.
Resale listings, days-on-market, absorption rates, and transaction prices. Requires HAR membership — not government data.
Builder floorplan catalogs with base prices, community/submarket boundary definitions, and vacant developed lot (VDL) status tracking. This data powers Houston CBAS's community-level new-home reporting but has no public equivalent.
Platform-level connectors for scaling local data ingestion to other metros.
$limit /
$offset paging. Structured query language (SoQL).
resultOffset /
resultRecordCount paging. Supports spatial
queries. Already proven with HCAD connector.
This table maps every Houston-style housing KPI to its best public data source, replicability class, and implementation difficulty.
| Houston-Style KPI | Best Public Source(s) | Replicability | Difficulty | Notes |
|---|---|---|---|---|
| Permits issued | Census BPS | Class A | 1-2 | Backbone metric. CBSA/county/place. |
| Quarterly starts | BPS + NRC calibration (modeled) | Class A/B | 2-4 | Federal-only: permit-lag model. With local data: first inspection = start. |
| Under construction | Modeled from starts pipeline | Class A/B | 3-4 | Stock-flow identity: UC = UCprev + starts - completions |
| Completions | Modeled from starts pipeline | Class A/B | 3-4 | Federal-only: lagged starts. With local: CO/final inspection. |
| Finished vacant | Completions minus closings | Class B/C | 4-5 | Needs both completion AND sales signals. Most difficult. |
| Closings (financed) | HMDA purchase originations | Class A | 3 | Mortgage-only proxy. Annual cadence. |
| Closings (total) | Recorder deed transfers | Class C | 4-5 | Paywalled, per-county, normalization hard. |
| VDL / future lots | Plats + engineering permits | Class C | 5 | No federal equivalent. Requires local plat/subdivision data. |
| MLS resale stats | HAR (proprietary) | Class C | 5 | Not government data. No public API. |
| House prices | FHFA HPI | Class A | 1 | Repeat-sales index, good metro coverage. |
| Employment | BLS API | Class A | 2 | Monthly, good metro coverage. |
| Population | Census PEP | Class A | 2 | Annual, metro/county. |
For the complete analysis of data sources, automation strategies, and modeling approaches, see our research documents.