A comprehensive catalog of federal and local public data sources for building a national CBSA-level housing-market dashboard.
A national, automatable "CBSA dashboard" for the top 25 U.S. housing markets is feasible using public data, but you will almost certainly need a two-tier architecture: (a) a federal backbone that is consistent across geographies (permitting, prices, mortgage origination activity, national construction pipeline), and (b) a local-systems layer that is inconsistent but can unlock starts/under-construction/completions and "effective inventory" at metro scale when jurisdictions expose inspection and certificate-of-occupancy milestones.
The most automation-friendly, high-coverage backbone sources are: (i) the Building Permits Survey (BPS) micro/compiled files for permits at CBSA/county/place granularity, (ii) HMDA public APIs and file services for mortgage originations and lender activity by MSA/MD and county, (iii) FHFA HPI for home price dynamics at metro/county/ZIP/tract, and (iv) FRED APIs as a convenient distribution layer for many housing series and release calendars.
For the specific metrics you listed — permits, starts (quarterly), under construction, completions/closings, finished vacant inventory, and sales closings with mortgage vs. cash proxies — public sources map as follows:
Your stated target is the "top 25 U.S. housing markets (CBSAs)." That "top 25" definition is unspecified; in practice you should treat it as a parameterized cohort defined by a rule (population, housing stock, permit volume, transaction volume, etc.) so the pipeline remains stable when rankings change. The federal statistical definition and membership of CBSAs changes over time, so your system needs a vintage-aware CBSA dimension.
A robust canonical geography layer usually needs:
The "top 25 CBSAs" list itself should be stored as a dataset
(effective_start/effective_end) rather than
hardcoded.
Placeholder cohort (since the exact list is unspecified):
| Rank | CBSA code | CBSA name | Selection rule | Notes |
|---|---|---|---|---|
| 1 | <CBSA_01> |
<CBSA_NAME_01> |
<rule> |
<placeholder> |
| 2 | <CBSA_02> |
<CBSA_NAME_02> |
<rule> |
<placeholder> |
| … | … | … | … | … |
| 25 | <CBSA_25> |
<CBSA_NAME_25> |
<rule> |
<placeholder> |
This section focuses on public, primary sources (or federal "official distribution layers") that are realistic to automate nationally.
Difficulty scoring is on a 1–5 scale (1 = straightforward bulk/API; 5 = restricted access, heavy normalization, or institution-by-institution constraints).
| Source | Primary metrics it supports | Cadence | Geo granularity | Automation interface | Automation difficulty |
|---|---|---|---|---|---|
| BPS (Building Permits) | Permit authorizations (monthly to quarterly; single vs multifamily typically available in BPS outputs) | Monthly revised; annual final | U.S., state, CBSA, county, place | Bulk downloads via Census files (Excel + ASCII); compiled master ZIP | 1–2 |
| NRC / Survey of Construction outputs | Starts, under construction, completions (authoritative) | Monthly; some quarterly tables | U.S. + Census regions (not CBSA) | Downloadable tables (XLSX) | 2 (national), 4 (metro via modeling) |
| NRS / Survey of Construction outputs | New homes sold; new-home inventory; "for sale by stage of construction" (incl. completed) | Monthly; some quarterly tables | U.S. + Census regions (not CBSA) | Downloadable tables (XLSX) | 2 (national), 4 (metro via modeling) |
| HUD–USPS vacancy/no-stat | Vacancy and "no-stat" counts; growth/decline signals; "no-stat includes under construction" | Quarterly | Typically very granular (down to neighborhood geographies in the HUD product), but access-gated | Portal download + license constraints | 5 (access restricted) |
| HMDA (public) | Mortgage originations/denials; lender activity; financed home-purchase proxy | Annual (filing year); query APIs support subsetting | Nationwide; MSA/MD, state, county via filters | Public file service + public query API; additional static datasets via S3 | 2–3 |
| FHFA HPI | Price index changes; metro/county/ZIP/tract price dynamics | Monthly + quarterly | National, division, state, metro, county, ZIP, tract | Direct downloads + dataset catalogs | 1–2 |
| FRED | Convenient API to many housing/macroeconomic series (including some NRC/NRS-derived series) | Varies by series | Varies (national; some regional/geo series) | REST API w/ key; multiple formats | 1–2 |
Below are concrete, automatable endpoints (or stable download locations) and the specific fields/series that tend to matter for your use case.
BPS explicitly states that data are available monthly/YTD/annual at CBSA (formerly MSA), county, and place levels, making it the primary national backbone for metro permitting.
BPS release cadence is operationally important: preliminary permits appear with the NRC press release on the 12th workday (U.S./region only), while revised permits by metro/county/place are released on the 17th workday; annual final is usually the first workday of May.
Programmatic download surfaces:
# Census BPS bulk directories (machine-friendly listings)
https://www2.census.gov/econ/bps/
# Master compiled dataset (large ZIP) + doc + sample
https://www2.census.gov/econ/bps/Master%20Data%20Set/
- BPS Compiled_YYYYMM.zip
- Compiled Data Documentation.docx
- Compiled File Sample.csv
The public directory structure includes separate folders for CBSA, county, place, etc., which is ideal for a scheduled ingestion job.
Key transformations:
NRC outputs are the authoritative federal series for starts and completions and explicitly cover new, privately-owned units (including apartments/condos) and exclude HUD-code manufactured homes.
The historical series page provides downloadable tables for:
plus quarterly tables by purpose/design.
Download endpoints:
# Examples of NRC historical XLSX tables (files are hosted under /construction/nrc/xls/)
https://www.census.gov/construction/nrc/xls/permits_cust.xlsx
https://www.census.gov/construction/nrc/xls/starts_cust.xlsx
https://www.census.gov/construction/nrc/xls/under_cust.xlsx
https://www.census.gov/construction/nrc/xls/comps_cust.xlsx
# Quarterly starts by purpose and design (example)
https://www.census.gov/construction/nrc/xls/starts_quarterly_cust.xlsx
NRC is indispensable for calibrating metro-level estimates, because it provides the national/region-level ground truth for the pipeline that permits eventually flow into.
The NRS program explicitly provides: new houses sold and for sale, and houses for sale by stage of construction, which is the closest federal proxy to "finished vacant new-home inventory" (completed, for-sale, unsold) in a consistent series.
Key downloadable tables include "sold and for sale by stage of construction."
# Examples of NRS historical XLSX tables (hosted under /construction/nrs/xls/)
https://www.census.gov/construction/nrs/xls/sold_cust.xlsx
https://www.census.gov/construction/nrs/xls/fsale_cust.xlsx
https://www.census.gov/construction/nrs/xls/stage_cust.xlsx
For your end metrics, NRS is best used in two ways:
HMDA public data is hosted on the FFIEC HMDA platform and includes loan-level information modified for privacy; it is designed for public disclosure and policy analysis.
The HMDA "file serving" documentation provides stable endpoints for retrieving institution-specific Modified LAR files and states that other files are served from a public S3 bucket prefix.
# Modified LAR file serving (institution-by-institution)
https://ffiec.cfpb.gov/file/modifiedLar/year/{year}/institution/{lei}/csv
https://ffiec.cfpb.gov/file/modifiedLar/year/{year}/institution/{lei}/csv/header
https://ffiec.cfpb.gov/file/modifiedLar/year/{year}/institution/{lei}/txt
https://ffiec.cfpb.gov/file/modifiedLar/year/{year}/institution/{lei}/txt/header
# Public S3 prefix for other HMDA publication files
https://files.ffiec.cfpb.gov/
The HMDA Data Browser API documentation is particularly valuable for your use case because it supports:
# Aggregations
GET https://ffiec.cfpb.gov/v2/data-browser-api/view/nationwide/aggregations
GET https://ffiec.cfpb.gov/v2/data-browser-api/view/aggregations
# CSV subsets
GET https://ffiec.cfpb.gov/v2/data-browser-api/view/nationwide/csv
GET https://ffiec.cfpb.gov/v2/data-browser-api/view/csv
Geographic filters include msamds (5-digit MSA/MD), states, and counties (5-digit county FIPS) — meaning you can produce CBSA-level views either directly via MSA/MD or by aggregating counties that belong to the CBSA delineation.
FHFA HPI provides a broad, transparent methodology and explicitly offers geographic detail down to metro area, county, ZIP code, and census tract, alongside national/state/division levels.
A particularly automation-friendly direct download is the master CSV published by FHFA (also indexed in Data.gov's catalog).
# FHFA master HPI CSV
https://www.fhfa.gov/hpi/download/monthly/hpi_master.csv
FHFA also publishes its own release dates for monthly and quarterly updates, which helps you schedule ingestion and backfills.
FRED's API requires an API key and provides standard endpoints
such as fred/series/observations, which can return
XML/JSON/XLSX/CSV depending on parameters.
# Example endpoint for time series observations
https://api.stlouisfed.org/fred/series/observations?series_id={SERIES_ID}&api_key={KEY}&file_type=json
FRED's API documentation enumerates release/series endpoints and also describes a Maps API for geographically keyed series where available.
A recurring theme in housing-related public data is that agencies publish data through federal catalogs and geospatial hubs, which are highly automatable if you standardize on a small set of access patterns.
Data.gov's catalog pages often include direct downloads (CSV/ZIP/GeoJSON/Shapefile) and stable "identifier" keys that link to upstream systems. For example, "Residential Construction Permits by County" is explicitly described as:
and offers direct download resources.
The same page provides a machine-usable ArcGIS download URL pattern
(format parameter, spatial reference, and
where=1=1).
# ArcGIS Open Data download API pattern (example from the dataset page)
https://opendata.arcgis.com/api/v3/datasets/{ITEMID}_{LAYER}/downloads/data?format=fgdb&spatialRefId=4326&where=1%3D1
https://opendata.arcgis.com/api/v3/datasets/{ITEMID}_{LAYER}/downloads/data?format=shp&spatialRefId=4326&where=1%3D1
Data.gov itself is backed by CKAN; its user guide documents the CKAN
API base URL and advises using package_search (since
package_list is disabled). For higher-volume crawling of
metadata, GSA also exposes an API gateway endpoint requiring an API
key in headers.
Many federal and local open data portals use the same "query a feature
layer" semantics. The ArcGIS REST API documents the canonical
FeatureServer/<layerId>/query endpoint and how to
query IDs vs. feature sets. Common parameters like output format
f=json and optional tokens for authenticated resources
are also standardized.
This matters because once you build a robust ArcGIS connector, you can reuse it across:
To automate construction pipeline metrics at metro scale, your biggest leverage comes from standardizing how you ingest permit lifecycle events (application → issuance → inspections → final/CO) from dozens of issuing authorities, even if each publishes differently.
Accela's REST "Getting Started" documentation is unusually
explicit about required headers and auth patterns: it uses
headers such as Authorization (access token),
x-accela-appid, and environment/agency headers,
with content negotiation via
Content-Type/Accept.
It also documents offset-based pagination with max limits (up to 1000 per request) and provides request samples.
For automation governance, Accela's docs also mention response
rate limit headers (x-ratelimit-limit,
x-ratelimit-remaining,
x-ratelimit-reset) and warn that agencies can
customize record type definitions — meaning your
normalization layer cannot assume consistent "permit type"
taxonomies across jurisdictions.
Automation implication: Accela itself is "API-friendly," but your integration effort is dominated by per-agency configuration and taxonomy mapping.
A key reality for automation is that many local governments run permitting workflows through enterprise suites; Tyler's API Catalog describes a "Permits and Code Enforcement API Toolkit" that provides programmatic access to building permits (including new construction/additions), inspections, and code enforcement resources and processes.
Even when full APIs are not openly documented, public-facing portals often expose consistent milestone interactions. A public PDF describing EnerGov and its portal capabilities notes that ePortal supports online permit and plan submission/payment and online inspection requests/cancellations — i.e., the same lifecycle events you need to reconstruct starts/under construction/completions.
Some jurisdictions also expose permitting-related datasets through ArcGIS feature services with vendor-branded service namespaces (illustrative for discovery, not universal).
OpenGov maintains a developer portal and publishes API-facing documentation (though some pages rely on client-side rendering). Their materials describe API usage in terms of product suites including Permitting & Licensing and API key handling.
Automation implication: you should treat OpenGov similarly to Accela: build a reusable connector if you can get credentials, but assume per-jurisdiction onboarding and schema exploration.
When a city publishes permit data publicly, it is often via a general-purpose platform rather than the permitting vendor API. Two high-yield patterns:
Socrata's documentation describes consistent endpoint construction using dataset identifiers and supports multiple formats (CSV/JSON/GeoJSON/XML).
It also provides a concrete approach to rate management:
application tokens increase throttling limits and can be passed
via X-App-Token or query parameters depending on
API version.
Pagination and maximum limits depend on the endpoint version, and Socrata documents these constraints.
Automation implication: Socrata is one of the easiest sources to automate (once you locate the dataset and interpret fields), but coverage varies by city.
Some local governments publish permit datasets conforming to the BLDS open data specification; the Data.gov listing for one city explicitly states its building permit application dataset conforms to BLDS and clarifies a real-world modeling issue: "applications" may map to multiple permits.
Automation implication: BLDS is a practical canonical schema for "issued permits" and "applications," but you will still need extensions for inspections, certificates of occupancy, and parcel/legal identifiers.
A scalable approach is to normalize everything into an event-sourced permit lifecycle, because different systems publish different "current status" fields but often expose timestamps for status changes.
Recommended canonical objects:
If your goal is true closings (not just financed originations), the most direct public signal is the recording of deeds and related instruments, but automation is uneven.
A county clerk-recorder description illustrates the "ideal" public-record model: the recorder preserves and archives documents relating to real property transactions; documents are recorded, indexed, digitally archived, and made available to the public (often with fees), and recording makes a document part of the public record.
However, automation constraints can be severe. One large county explicitly states it no longer offers online search of the official record index; records are only available for purchase in person or by mail for privacy protections.
National automation strategy: treat recorder ingestion as a CBSA-by-CBSA (and county-by-county) program with a prioritized rollout, rather than assuming a single national API.
Because NRC/NRS do not provide CBSA-level starts/under-construction/completions, you need a modeling layer that combines federal truth series with metro permit issuance and (where possible) local milestone data.
This approach respects the fact that NRC provides official pipeline totals while BPS provides local permitting volumes; the model is effectively a translation layer between "authorized" and "in progress/completed" at metro scale.
HUD's USPS data description notes that "no-stat" can include addresses for homes under construction and suggests comparing total address counts and no-stat changes to distinguish growth vs. decline areas.
If you qualify for access, you can use changes in "no-stat residential" counts as a supplementary signal for construction activity and demolition, but the dataset's access restrictions and methodological caveats (e.g., USPS operational changes) make it a high-friction dependency.
A workable "metro finished vacant new-home inventory" estimate often looks like:
These are practical engineering estimates for building a reliable, monitored ingestion (not a one-off download script). Times assume one experienced data engineer + basic infra.
| Source / layer | Typical automation work | Difficulty | "Time to first reliable ingestion" |
|---|---|---|---|
| BPS bulk files | Scheduled downloads + parsing + CBSA vintaging handling + quarterly aggregation | 1–2 | ~2–5 days |
| NRC/NRS tables | Scheduled downloads + parsing + versioning + release lag handling | 2 | ~3–7 days |
| FHFA HPI master CSV | Scheduled pull + incremental update detection + geo key normalization | 1–2 | ~2–4 days |
| HMDA Data Browser API | Parameterized query builder; caching; streaming CSV handling; MSA/MD-to-CBSA mapping | 2–3 | ~1–3 weeks |
| Data.gov discovery | Metadata harvesting; rules to pick authoritative datasets per CBSA; monitoring link rot | 2–3 | ~1–2 weeks |
| ArcGIS Open Data ingestion | Generic ArcGIS connector (query + downloads) + schema inference + paging/stats | 2–3 | ~1–2 weeks |
| Socrata ingestion | Generic Socrata connector + dataset discovery + field mapping to permit schema | 2–3 | ~1–2 weeks |
| Vendor permitting APIs (Accela/OpenGov/Tyler) | Credentialed per-jurisdiction onboarding + taxonomy mapping + delta sync | 3–5 | ~2–6 weeks per platform + onboarding per agency |
| County recorder ingestion | Highly variable portal automation + legal/ToS review + per-county customization | 4–5 | ~2–8+ weeks per county (often not scalable purely by engineering) |
Accela's API is technically well specified (headers, pagination, rate-limit headers), but the hard part is agency-specific configuration and semantic normalization. Socrata is typically "easy-mode" from a pure API perspective (identifiers, multiple formats, app token throttling control).
A warehouse-friendly schema that supports both "federal backbone" and "local lifecycle" data:
-- Geography and cohort
create table dim_cbsa (
cbsa_code varchar primary key,
cbsa_name varchar,
delineation_vintage date,
effective_start date,
effective_end date
);
create table bridge_cbsa_county (
cbsa_code varchar,
county_fips char(5),
delineation_vintage date,
primary key (cbsa_code, county_fips, delineation_vintage)
);
-- Federal: permits (BPS)
create table fact_bps_permits (
ref_month date, -- first day of month
cbsa_code varchar,
county_fips char(5) null,
place_fips varchar null,
units_authorized integer,
source_file varchar,
revised_flag boolean,
load_ts timestamp,
primary key (ref_month, cbsa_code, coalesce(county_fips,'-'), coalesce(place_fips,'-'), revised_flag)
);
-- Federal: construction pipeline (NRC)
create table fact_nrc_pipeline (
ref_month date,
geo_level varchar, -- US, REGION
geo_code varchar, -- e.g. US, NE, MW, S, W
metric varchar, -- permits, starts, under_construction, completions, auth_not_started
units_saar numeric,
units_nsa numeric null,
source_table varchar,
load_ts timestamp,
primary key (ref_month, geo_level, geo_code, metric)
);
-- Federal: new residential sales (NRS)
create table fact_nrs_inventory (
ref_month date,
geo_level varchar, -- US, REGION
geo_code varchar,
metric varchar, -- sold, for_sale, for_sale_completed, for_sale_under_construction, etc.
value numeric,
source_table varchar,
load_ts timestamp,
primary key (ref_month, geo_level, geo_code, metric)
);
-- Federal: HMDA aggregates (for CBSA dashboards)
create table fact_hmda_aggregations (
activity_year integer,
geo_level varchar, -- MSA_MD, STATE, COUNTY, CBSA_DERIVED
geo_code varchar,
filter_hash varchar, -- stable signature of parameter set
loans_count bigint,
loan_amount_sum numeric,
load_ts timestamp,
primary key (activity_year, geo_level, geo_code, filter_hash)
);
-- Local: event-sourced permitting lifecycle (raw + normalized)
create table raw_permit_events (
source_system varchar, -- Accela, Socrata, ArcGIS, etc.
jurisdiction_id varchar,
record_id varchar,
event_type varchar, -- applied, issued, inspection_scheduled, inspection_passed, co_issued, finaled
event_ts timestamp,
payload_json jsonb,
load_ts timestamp
);
create table fact_permit_lifecycle (
jurisdiction_id varchar,
record_id varchar,
cbsa_code varchar,
county_fips char(5),
event_type varchar,
event_ts timestamp,
work_type varchar,
units integer null,
valuation numeric null,
primary key (jurisdiction_id, record_id, event_type, event_ts)
);
-- Recorder / deeds (where available)
create table fact_deed_transfers (
county_fips char(5),
recording_date date,
document_type varchar,
sales_price numeric null,
loan_amount numeric null,
is_cash boolean null,
parcel_id varchar null,
load_ts timestamp
);
The following diagram describes the overall data flow from federal and local sources through the platform's ingestion, normalization, and serving layers.
flowchart LR
subgraph FederalBackbone
BPS[BPS permits files]
NRC[NRC tables]
NRS[NRS tables]
HMDA[HMDA APIs + files]
FHFA[FHFA HPI downloads]
FRED[FRED API]
CBSADef[CBSA delineation files]
end
subgraph LocalLayer
OPD[Open data portals (Socrata / ArcGIS / CKAN)]
Vendor[Permitting vendor APIs (Accela / OpenGov / Tyler)]
Recorder[County recorder portals]
end
subgraph Platform
Ingest[Scheduled ingestion + backfills]
Raw[Raw lake (original files + JSON)]
Normalize[Normalization + entity resolution (address/parcel/geo keys)]
Facts[Fact tables + aggregates (monthly/quarterly CBSA metrics)]
Serve[API + dashboard layer]
Monitor[QA + anomaly detection, release lag + schema drift]
end
BPS --> Ingest
NRC --> Ingest
NRS --> Ingest
HMDA --> Ingest
FHFA --> Ingest
FRED --> Ingest
CBSADef --> Normalize
OPD --> Ingest
Vendor --> Ingest
Recorder --> Ingest
Ingest --> Raw --> Normalize --> Facts --> Serve
Facts --> Monitor
Raw --> Monitor
The diagram above is expressed in Mermaid syntax. The data flow proceeds from federal backbone and local-layer sources through scheduled ingestion into a raw lake, then through normalization into fact tables, and finally to an API/dashboard serving layer with QA monitoring.
Use release-aware scheduling to reduce churn. BPS documents the distinction between preliminary and revised permit releases and their typical workday timing, which is important for your "Q4 last year" reporting cutoffs. FHFA publishes a forward calendar of monthly/quarterly release dates.
gantt
title Typical public-data refresh cadence (conceptual)
dateFormat YYYY-MM-DD
axisFormat %b %Y
section Monthly
BPS revised permits (metro/county/place) :active, 2026-01-01, 2026-12-31
NRC monthly construction pipeline :active, 2026-01-01, 2026-12-31
NRS monthly new-home sales/inventory :active, 2026-01-01, 2026-12-31
FHFA monthly HPI :active, 2026-01-01, 2026-12-31
section Quarterly / annual
NRC quarterly purpose/design tables :active, 2026-01-01, 2026-12-31
FHFA quarterly HPI :active, 2026-01-01, 2026-12-31
HMDA annual filing year publication :active, 2026-01-01, 2026-12-31
HUD-USPS vacancy/no-stat (restricted) :active, 2026-01-01, 2026-12-31
The Gantt chart above is expressed in Mermaid syntax. It shows that BPS, NRC, NRS, and FHFA HPI refresh monthly, while NRC quarterly tables, FHFA quarterly HPI, HMDA annual publications, and HUD-USPS vacancy data refresh on longer cycles.
| Source | Cadence | Typical release timing |
|---|---|---|
| BPS revised permits (metro/county/place) | Monthly | 17th workday of month |
| NRC construction pipeline | Monthly | 12th workday of month |
| NRS new-home sales/inventory | Monthly | Monthly release |
| FHFA HPI | Monthly + Quarterly | Published release calendar |
| NRC quarterly tables (purpose/design) | Quarterly | Quarterly release |
| HMDA annual filing | Annual | Annual publication |
| HUD-USPS vacancy/no-stat | Quarterly | Access restricted |
Use this card to operationalize rollout across the top-25 CBSAs and keep a structured view of what is automatable.
cbsa_availability_card:
cbsa_code: "<CBSA>"
cbsa_name: "<NAME>"
delineation_vintage: "2023-07-01" # example
counties:
- county_fips: "_____" # list all member counties
county_name: "<COUNTY>"
recorder_access:
mode: ["open_search", "login_required", "paywall", "in_person_only"]
automation_risk: ["low", "medium", "high"]
notes: "<constraints>"
permitting_authorities:
- authority_name: "<CITY/COUNTY DEPT>"
system_type: ["Accela", "Tyler/EnerGov", "OpenGov", "custom", "unknown"]
public_data_surface:
- type: ["Socrata", "ArcGIS", "CKAN", "CSV", "HTML"]
endpoint: "<url>"
auth: ["none", "app_token", "token"]
fields_present:
permits_issued: true/false
valuation: true/false
units: true/false
inspections: true/false
certificate_of_occupancy_or_final: true/false
normalization_notes: "<permit types/status mapping>"
federal_backbone_coverage:
bps_cbsa_permits: true
hmda_msamd_or_county: true
fhfa_hpi_metro_or_county: true
usps_vacancy_restricted: "unknown/eligible/not_eligible"
computed_metrics_feasibility:
permits_qtr: "high"
starts_qtr: "medium (modeled)"
under_construction_stock: "medium (modeled)"
completions_qtr: "medium (modeled or local)"
finished_vacant_inventory: "medium/low (depends on local CO + closings)"
closings_total: "low/medium (depends on recorder)"
overall_score:
data_completeness: 0-100
automation_effort_weeks: "<estimate>"
Finally, for the original motivating example — "how many homes started in Q4 last year in Houston; how many closed; how many under construction; how many finished vacant" — the federal system can give you Q4 permits for the Houston CBSA directly (BPS), but Q4 starts/under-construction/completions and finished vacant inventory require either modeling calibrated to NRC/NRS or local milestone data plus recorder/HMDA-derived absorptions.