Data Sources

The complete catalog of federal and local data sources powering RealHouse metro-level housing analytics.

Two-Tier Data Architecture

RealHouse uses a two-tier data architecture. The federal backbone provides uniform, automatable coverage across all U.S. metros. The local systems layer provides higher-fidelity construction lifecycle data but must be built per-jurisdiction.

Federal Backbone
Census BPS, NRC/NRS, FHFA HPI,
FRED, BLS, PEP, HMDA
Local Systems Layer
Permit portals, assessor data,
recorder deeds, MLS feeds

Federal sources give uniform national coverage; local sources add lifecycle granularity per metro.

Replicability Classes

Not every Houston-style housing metric can be reproduced in every metro. We classify each metric by how replicable it is using public data:

Class A — Fully Reproducible from Federal Sources

Metrics: Permits issued, house prices, employment, population, mortgage originations.

These metrics are available uniformly across all CBSAs because they come entirely from federal statistical programs with consistent geographic coverage. No local data required.

Class B — Reproducible with Local Open Data

Metrics: Starts, under construction, completions, closings.

Requires permit inspection events and certificate-of-occupancy records from local permitting portals. Quality varies significantly by jurisdiction — some cities have robust open-data portals (Socrata, ArcGIS), while others are web-only with no bulk API.

Class C — Likely Proprietary / Requires Modeling

Metrics: Finished vacant inventory, VDL (vacant developed lots), future lots, MLS resale stats, community-level data.

These metrics require field research, proprietary data feeds, or sophisticated modeling. No federal equivalent exists. Scaling these beyond a single metro demands significant per-jurisdiction investment.

Federal Backbone Sources

These seven federal data sources form the backbone of RealHouse analytics. They provide uniform, automatable coverage and are the foundation for every metro profile.

Census Building Permits Survey (BPS) Done

Monthly permit authorizations (units, buildings, valuation) by CBSA, county, and place.

Full source details
  • Provides: Monthly permit authorizations — unit counts, building counts, and construction valuation — broken out by CBSA, county, and place (permit-issuing office).
  • Cadence: Monthly revised release (17th workday of each month).
  • Geography: CBSA, county, place (permit office).
  • Format: Fixed-position ASCII text files + Excel spreadsheets.
  • Endpoints: https://www2.census.gov/econ/bps/ (bulk directory). CBSA files follow the pattern cbsaYYMMc.txt.
  • Key constraint: The only guaranteed federal CBSA-level production indicator. This is the single most important data source for RealHouse.
  • Automation difficulty: 1-2 / 5
  • Implementation: packages/ingest/src/connectors/bps.py — position-based parser for real Census fixed-width format.
  • View sample data table →

Census NRC/NRS (Survey of Construction) Done

National/regional starts, under construction, completions (NRC) and new-home sales/inventory (NRS).

Full source details
  • Provides: National and regional starts, units under construction, and completions (NRC). New residential sales and for-sale inventory (NRS).
  • Cadence: Monthly press release (12th workday) + quarterly supplements with additional detail.
  • Geography: National + Census regions only (NOT CBSA-level).
  • Format: XLSX tables.
  • Endpoints: https://www.census.gov/construction/nrc/ and https://www.census.gov/construction/nrs/
  • Key constraint: Cannot get metro-level starts, under-construction, or completions from this source. Used for calibration of metro-level models only.
  • Automation difficulty: 1 / 5
  • View sample data table →

FHFA House Price Index Done

Repeat-sales house price index by metro, state, and national level.

Full source details
  • Provides: Repeat-sales house price index covering metro, state, and national geographies.
  • Cadence: Monthly file updates; quarterly MSA-level releases.
  • Geography: National, state, MSA/city.
  • Format: Single master CSV file.
  • Endpoint: https://www.fhfa.gov/hpi/download/monthly/hpi_master.csv
  • Key note: FHFA HPI is quarterly at the MSA level. The period field encodes the quarter number (Q1=1, Q2=4, Q3=7, Q4=10 mapped to months).
  • Automation difficulty: 1 / 5
  • Implementation: packages/ingest/src/connectors/fhfa_hpi.py
  • View sample data table →

FRED API Done

Programmatic access to many housing and macro series, including BPS-derived permit series.

Full source details
  • Provides: Programmatic access to thousands of economic time series, including BPS-derived permit series, interest rates, and housing-related indicators.
  • Cadence: Varies by underlying series.
  • Geography: Varies — national, some regional, some metro-level series.
  • Format: JSON/CSV via REST API.
  • Endpoint: https://fred.stlouisfed.org/docs/api/fred/series_observations.html
  • Key note: Requires an API key. Useful as a convenience layer for cross-checking BPS numbers and pulling macro context.
  • Automation difficulty: 1-2 / 5
  • Implementation: packages/ingest/src/connectors/fred.py
  • View sample data table →

Bureau of Labor Statistics Done

Metro employment levels, unemployment rates, and job growth by sector.

Full source details
  • Provides: Metro-area employment levels, unemployment rates, and job growth broken out by industry sector.
  • Cadence: Monthly.
  • Geography: National + metro areas.
  • Format: JSON via public API v2.
  • Key note: Optional API key increases rate limits. Requires series ID management — each metro + data type combination has a unique series identifier.
  • Automation difficulty: 2 / 5
  • Implementation: packages/ingest/src/connectors/bls.py
  • View sample data table →

Census Population Estimates Program Done

Metro population totals and components of change (births/deaths, domestic migration, international migration).

Full source details
  • Provides: Metro/micro area population totals and components of change — births, deaths, domestic migration, and international migration.
  • Cadence: Annual.
  • Geography: Metro and micropolitan areas.
  • Format: Downloadable CSV files.
  • Key note: Vintage-aware — population estimates change with each annual release as the Census Bureau incorporates new data and revised methodologies.
  • Automation difficulty: 2 / 5
  • Implementation: packages/ingest/src/connectors/census_pop.py
  • View sample data table →

HMDA (Home Mortgage Disclosure Act) Done

Mortgage origination data — purchase loans as a proxy for financed closings.

Full source details
  • Provides: Mortgage origination data including purchase loans, which serve as a proxy for financed closings (new and resale).
  • Cadence: Annual bulk release.
  • Geography: MSA/MD (5-digit codes), county, census tract.
  • Format: CSV (very large files); also queryable via the Data Browser API.
  • Endpoint: https://ffiec.cfpb.gov/v2/data-browser-api/view/aggregations
  • Key constraint: Only covers mortgage-financed purchases — misses all-cash transactions. MSA/MD codes may diverge from CBSA codes and must be mapped through county FIPS to align with the RealHouse geography dimension.
  • Automation difficulty: 3 / 5
  • Implementation: packages/ingest/src/connectors/hmda.py — uses the Data Browser API for aggregated queries.
  • View sample data table →

Local Systems Layer

Local data is where the real construction lifecycle visibility comes from — permit inspection events reveal starts, certificates of occupancy mark completions, and deed transfers capture closings. But this data is deeply fragmented: every city, county, and jurisdiction runs its own permitting system with its own portal, formats, and access policies. There is no federal equivalent for this granularity.

Houston (CBSA 26420) is our first local-data metro. Here's a complete inventory of what we've tapped, what's documented and waiting, and what remains out of reach.

Implemented & Active

Houston Aggregated Residential Permits Done

Monthly single-family and multifamily permit counts from the City of Houston open-data portal, back to 2004.

Full source details
  • Source: City of Houston Open Data (data.houstontx.gov) — direct XLSX download.
  • Provides: Monthly aggregate counts of single-family, multifamily, and total residential permits. ~250 rows from 2004 onward.
  • Cadence: Monthly.
  • Format: Excel (XLSX) with columns: Year (int), Month (int), Single Family, Multi-Family, Residential.
  • Value: Provides a local cross-check against federal BPS permit counts. City-level aggregates often diverge slightly from Census survey totals due to different counting methodologies.
  • Automation difficulty: 1 / 5
  • Implementation: packages/ingest/src/connectors/houston/permits_agg.py — HTTP download + openpyxl parser. 252 rows loaded into Supabase.
  • View sample data table →

Harris County Appraisal District (HCAD) Done

Residential parcel inventory via ArcGIS REST — property values, lot sizes, ownership, and state classification for 500K+ parcels.

Full source details
  • Source: HCAD ArcGIS REST MapServer at gis.hctx.net.
  • Provides: 30+ fields per parcel including HCAD number, state class (A1/A2 = single-family, B1-B4 = multifamily), land/building/total market value, acreage, tax year, new-owner date, and site address.
  • Cadence: Quarterly GIS updates.
  • Format: ArcGIS REST JSON with pagination (resultOffset/resultRecordCount).
  • Value: New-owner dates serve as a closings proxy. State class codes identify new-construction parcels. Market values enable price-tier analysis independent of MLS.
  • Key note: Residential filtering uses state_class (letter codes A1, B1, etc.), NOT land_use (which has numeric codes like 1001).
  • Automation difficulty: 2 / 5
  • Implementation: packages/ingest/src/connectors/houston/hcad.py — ArcGIS REST connector with paginated fetch and UTC-aware date conversion. Bulk load of 500K+ parcels pending (connector works, but remote insert throughput is the bottleneck).

Documented & Not Yet Tapped

TPIA Permit Lifecycle Data Awaiting Data

Individual permit records with full lifecycle dates — from application through inspections to certificate of occupancy. This is the critical missing source for computing actual starts, completions, and under-construction stock.

Full source details
  • Source: Bulk data requested via TPIA (Texas Public Information Act) from City of Houston Permitting Center and Harris County.
  • Provides: Individual permit records with lifecycle dates: applied, issued, foundation inspection, frame inspection, final inspection, certificate of occupancy. Inspection events with types, results, and dates.
  • Format: CSV or XLSX (format TBD when data arrives).
  • Why it matters: Without this data, our starts/completions/UC numbers are modeled estimates using national permit-to-start lag distributions calibrated to Census NRC. With it, we'd have actual foundation inspection dates (starts), CO dates (completions), and open-permit counts (UC stock) at the individual-permit level.
  • Target tables: fact_permit_record (individual permits) and fact_inspection_event (inspection milestones). Schema already defined in migration 012.
  • Automation difficulty: 4 / 5
  • Implementation: packages/ingest/src/connectors/houston/tpia.py — stub connector with field mapping documentation, ready for data arrival.

Houston Planning Plat & Activity Reports Researched

Excel reports from the Planning & Development Department showing subdivision approvals, lot counts, and entitlement pipeline activity.

Full source details
  • Source: City of Houston Planning & Development Department — publicly posted Excel reports.
  • Provides: Upcoming planning commission actions, subdivision/replat approvals, lot counts, and entitlement pipeline signals.
  • Format: XLSX reports.
  • Value: Leading indicator for future permit activity — subdivisions must be platted before building permits can be issued.
  • Blocker: Report URLs must be manually discovered; report structure may change between releases.
  • Automation difficulty: 2-3 / 5

City of Houston Permit Portal Researched

Web-based permit search at houstonpermittingcenter.org — individual permit records with status, but no public API.

Full source details
  • Source: houstonpermittingcenter.org — web interface for permit search, application, payment, and inspection scheduling.
  • Provides: Individual permit records with current status, application dates, and inspection request tracking.
  • Blocker: No public bulk API or data export. Would require web scraping or formal API access negotiation with the city.
  • Automation difficulty: 4 / 5

Harris County ePermits Portal Researched

Web portal for unincorporated Harris County permits — JavaScript-gated login, no documented API.

Full source details
  • Source: Harris County ePermits web portal.
  • Provides: County-level residential and commercial permits for unincorporated areas.
  • Blocker: Login-gated JavaScript application with no documented public API. Bulk export not available.
  • Automation difficulty: 4 / 5

Harris County Clerk — Deed Recordings Researched

Real property deed transfers — the most accurate closings signal, but account-gated and paywalled.

Full source details
  • Source: Harris County Clerk's Office real property recording database.
  • Provides: Deed transfers including buyer, seller, property address, recording date, and transaction type. Would be the most accurate new-construction closings signal — better than HMDA (which only covers mortgage-financed purchases) or HCAD new-owner dates (which lag).
  • Blocker: Portal requires an account with login credentials. Document access is paid per page. No bulk download or API. Highest automation friction of all Houston sources.
  • Automation difficulty: 5 / 5

Proprietary / Not Public

Houston Association of Realtors (HAR) MLS Proprietary

Resale listings, days-on-market, absorption rates, and transaction prices. Requires HAR membership — not government data.

CBAS Internal Data (Floorplans, Communities, VDL) Proprietary

Builder floorplan catalogs with base prices, community/submarket boundary definitions, and vacant developed lot (VDL) status tracking. This data powers Houston CBAS's community-level new-home reporting but has no public equivalent.

Connector Platforms (for Scaling Beyond Houston)

Reusable Connector Patterns Future

Platform-level connectors for scaling local data ingestion to other metros.

Full source details
  • Socrata (SODA API): Common platform for city open-data portals. Uses $limit / $offset paging. Structured query language (SoQL).
  • ArcGIS FeatureServer: Common for county GIS data. Uses resultOffset / resultRecordCount paging. Supports spatial queries. Already proven with HCAD connector.
  • Accela / Tyler / OpenGov: Permitting workflow systems used by many jurisdictions across the country. Access typically requires credentials or formal data-sharing agreements.
  • Strategy: Build reusable connector modules for each platform type, then instantiate per-metro configurations. One Socrata connector serves dozens of cities.

Metric-to-Source Mapping

This table maps every Houston-style housing KPI to its best public data source, replicability class, and implementation difficulty.

Houston-Style KPI Best Public Source(s) Replicability Difficulty Notes
Permits issued Census BPS Class A 1-2 Backbone metric. CBSA/county/place.
Quarterly starts BPS + NRC calibration (modeled) Class A/B 2-4 Federal-only: permit-lag model. With local data: first inspection = start.
Under construction Modeled from starts pipeline Class A/B 3-4 Stock-flow identity: UC = UCprev + starts - completions
Completions Modeled from starts pipeline Class A/B 3-4 Federal-only: lagged starts. With local: CO/final inspection.
Finished vacant Completions minus closings Class B/C 4-5 Needs both completion AND sales signals. Most difficult.
Closings (financed) HMDA purchase originations Class A 3 Mortgage-only proxy. Annual cadence.
Closings (total) Recorder deed transfers Class C 4-5 Paywalled, per-county, normalization hard.
VDL / future lots Plats + engineering permits Class C 5 No federal equivalent. Requires local plat/subdivision data.
MLS resale stats HAR (proprietary) Class C 5 Not government data. No public API.
House prices FHFA HPI Class A 1 Repeat-sales index, good metro coverage.
Employment BLS API Class A 2 Monthly, good metro coverage.
Population Census PEP Class A 2 Annual, metro/county.

Research Documents

For the complete analysis of data sources, automation strategies, and modeling approaches, see our research documents.

Browse the full research library →