Data Sources

The complete catalog of federal and local data sources powering RealHouse metro-level housing analytics.

Two-Tier Data Architecture

RealHouse uses a two-tier data architecture. The federal backbone provides uniform, automatable coverage across all U.S. metros. The local systems layer provides higher-fidelity construction lifecycle data but must be built per-jurisdiction.

Federal Backbone
Census BPS, NRC/NRS, FHFA HPI,
FRED, BLS, PEP, HMDA
Local Systems Layer
Permit portals, assessor data,
recorder deeds, MLS feeds

Federal sources give uniform national coverage; local sources add lifecycle granularity per metro.

Replicability Classes

Not every Houston-style housing metric can be reproduced in every metro. We classify each metric by how replicable it is using public data:

Class A — Fully Reproducible from Federal Sources

Metrics: Permits issued, house prices, employment, population, mortgage originations.

These metrics are available uniformly across all CBSAs because they come entirely from federal statistical programs with consistent geographic coverage. No local data required.

Class B — Reproducible with Local Open Data

Metrics: Starts, under construction, completions, closings.

Requires permit inspection events and certificate-of-occupancy records from local permitting portals. Quality varies significantly by jurisdiction — some cities have robust open-data portals (Socrata, ArcGIS), while others are web-only with no bulk API.

Class C — Likely Proprietary / Requires Modeling

Metrics: Finished vacant inventory, VDL (vacant developed lots), future lots, MLS resale stats, community-level data.

These metrics require field research, proprietary data feeds, or sophisticated modeling. No federal equivalent exists. Scaling these beyond a single metro demands significant per-jurisdiction investment.

Federal Backbone Sources

These seven federal data sources form the backbone of RealHouse analytics. They provide uniform, automatable coverage and are the foundation for every metro profile.

Census Building Permits Survey (BPS) Done

Monthly permit authorizations (units, buildings, valuation) by CBSA, county, and place.

Full source details
  • Provides: Monthly permit authorizations — unit counts, building counts, and construction valuation — broken out by CBSA, county, and place (permit-issuing office).
  • Cadence: Monthly revised release (17th workday of each month).
  • Geography: CBSA, county, place (permit office).
  • Format: Fixed-position ASCII text files + Excel spreadsheets.
  • Endpoints: https://www2.census.gov/econ/bps/ (bulk directory). CBSA files follow the pattern cbsaYYMMc.txt.
  • Key constraint: The only guaranteed federal CBSA-level production indicator. This is the single most important data source for RealHouse.
  • Automation difficulty: 1-2 / 5
  • Implementation: packages/ingest/src/connectors/bps.py — position-based parser for real Census fixed-width format.

Census NRC/NRS (Survey of Construction) Done

National/regional starts, under construction, completions (NRC) and new-home sales/inventory (NRS).

Full source details
  • Provides: National and regional starts, units under construction, and completions (NRC). New residential sales and for-sale inventory (NRS).
  • Cadence: Monthly press release (12th workday) + quarterly supplements with additional detail.
  • Geography: National + Census regions only (NOT CBSA-level).
  • Format: XLSX tables.
  • Endpoints: https://www.census.gov/construction/nrc/ and https://www.census.gov/construction/nrs/
  • Key constraint: Cannot get metro-level starts, under-construction, or completions from this source. Used for calibration of metro-level models only.
  • Automation difficulty: 1 / 5

FHFA House Price Index Done

Repeat-sales house price index by metro, state, and national level.

Full source details
  • Provides: Repeat-sales house price index covering metro, state, and national geographies.
  • Cadence: Monthly file updates; quarterly MSA-level releases.
  • Geography: National, state, MSA/city.
  • Format: Single master CSV file.
  • Endpoint: https://www.fhfa.gov/hpi/download/monthly/hpi_master.csv
  • Key note: FHFA HPI is quarterly at the MSA level. The period field encodes the quarter number (Q1=1, Q2=4, Q3=7, Q4=10 mapped to months).
  • Automation difficulty: 1 / 5
  • Implementation: packages/ingest/src/connectors/fhfa_hpi.py

FRED API Done

Programmatic access to many housing and macro series, including BPS-derived permit series.

Full source details
  • Provides: Programmatic access to thousands of economic time series, including BPS-derived permit series, interest rates, and housing-related indicators.
  • Cadence: Varies by underlying series.
  • Geography: Varies — national, some regional, some metro-level series.
  • Format: JSON/CSV via REST API.
  • Endpoint: https://fred.stlouisfed.org/docs/api/fred/series_observations.html
  • Key note: Requires an API key. Useful as a convenience layer for cross-checking BPS numbers and pulling macro context.
  • Automation difficulty: 1-2 / 5
  • Implementation: packages/ingest/src/connectors/fred.py

Bureau of Labor Statistics Done

Metro employment levels, unemployment rates, and job growth by sector.

Full source details
  • Provides: Metro-area employment levels, unemployment rates, and job growth broken out by industry sector.
  • Cadence: Monthly.
  • Geography: National + metro areas.
  • Format: JSON via public API v2.
  • Key note: Optional API key increases rate limits. Requires series ID management — each metro + data type combination has a unique series identifier.
  • Automation difficulty: 2 / 5
  • Implementation: packages/ingest/src/connectors/bls.py

Census Population Estimates Program Done

Metro population totals and components of change (births/deaths, domestic migration, international migration).

Full source details
  • Provides: Metro/micro area population totals and components of change — births, deaths, domestic migration, and international migration.
  • Cadence: Annual.
  • Geography: Metro and micropolitan areas.
  • Format: Downloadable CSV files.
  • Key note: Vintage-aware — population estimates change with each annual release as the Census Bureau incorporates new data and revised methodologies.
  • Automation difficulty: 2 / 5
  • Implementation: packages/ingest/src/connectors/census_pop.py

HMDA (Home Mortgage Disclosure Act) Done

Mortgage origination data — purchase loans as a proxy for financed closings.

Full source details
  • Provides: Mortgage origination data including purchase loans, which serve as a proxy for financed closings (new and resale).
  • Cadence: Annual bulk release.
  • Geography: MSA/MD (5-digit codes), county, census tract.
  • Format: CSV (very large files); also queryable via the Data Browser API.
  • Endpoint: https://ffiec.cfpb.gov/v2/data-browser-api/view/aggregations
  • Key constraint: Only covers mortgage-financed purchases — misses all-cash transactions. MSA/MD codes may diverge from CBSA codes and must be mapped through county FIPS to align with the RealHouse geography dimension.
  • Automation difficulty: 3 / 5
  • Implementation: packages/ingest/src/connectors/hmda.py — uses the Data Browser API for aggregated queries.

Local Systems Layer

Local data is where the real construction lifecycle visibility comes from — permit inspection events reveal starts, certificates of occupancy mark completions, and deed transfers capture closings. But this data is deeply fragmented: every city, county, and jurisdiction runs its own permitting system with its own portal, formats, and access policies. There is no federal equivalent for this granularity.

Houston Permit Portals Researched

City and county permitting portals for the Houston metro — the first local data target.

Full source details
  • City of Houston: Online portal at houstonpermittingcenter.org — web-only interface, no bulk API or data export available.
  • Harris County ePermits: Web portal for unincorporated Harris County — also web-only, no bulk API.
  • Strategy: Filed a TPIA (Texas Public Information Act) request for bulk historical permit data. Building the federal backbone pipeline while waiting for the response.
  • Data value: Inspection timestamps reveal actual starts; CO/final inspection dates mark completions. This turns permit records into a full construction lifecycle timeline.

Harris County Appraisal District (HCAD) Researched

ArcGIS REST endpoint + quarterly GIS downloads with rich parcel-level data.

Full source details
  • Access: ArcGIS REST API + quarterly GIS data downloads.
  • Coverage: 61-field parcel dataset including market values, land use codes, improvement details, and new-owner dates.
  • Use case: New-owner dates can serve as a proxy for closings. Land-use codes help identify new-construction parcels vs. existing homes.
  • Timeline: Planned for Phase 3 of the RealHouse build-out.

Connector Platforms Future

Reusable connector patterns for scaling local data ingestion to other metros.

Full source details
  • Socrata (SODA API): Common platform for city open-data portals. Uses $limit / $offset paging. Structured query language (SoQL).
  • ArcGIS FeatureServer: Common for county GIS data. Uses resultOffset / resultRecordCount paging. Supports spatial queries.
  • Accela / Tyler / OpenGov: Permitting workflow systems used by many jurisdictions across the country. Access typically requires credentials or formal data-sharing agreements.
  • Strategy: Build reusable connector modules for each platform type, then instantiate per-metro configurations. One Socrata connector serves dozens of cities.

Metric-to-Source Mapping

This table maps every Houston-style housing KPI to its best public data source, replicability class, and implementation difficulty.

Houston-Style KPI Best Public Source(s) Replicability Difficulty Notes
Permits issued Census BPS Class A 1-2 Backbone metric. CBSA/county/place.
Quarterly starts BPS + NRC calibration (modeled) Class A/B 2-4 Federal-only: permit-lag model. With local data: first inspection = start.
Under construction Modeled from starts pipeline Class A/B 3-4 Stock-flow identity: UC = UCprev + starts - completions
Completions Modeled from starts pipeline Class A/B 3-4 Federal-only: lagged starts. With local: CO/final inspection.
Finished vacant Completions minus closings Class B/C 4-5 Needs both completion AND sales signals. Most difficult.
Closings (financed) HMDA purchase originations Class A 3 Mortgage-only proxy. Annual cadence.
Closings (total) Recorder deed transfers Class C 4-5 Paywalled, per-county, normalization hard.
VDL / future lots Plats + engineering permits Class C 5 No federal equivalent. Requires local plat/subdivision data.
MLS resale stats HAR (proprietary) Class C 5 Not government data. No public API.
House prices FHFA HPI Class A 1 Repeat-sales index, good metro coverage.
Employment BLS API Class A 2 Monthly, good metro coverage.
Population Census PEP Class A 2 Annual, metro/county.

Research Documents

For the complete analysis of data sources, automation strategies, and modeling approaches, see our research documents.

Browse the full research library →