Reproducing the Houston CBAS Metrics Nationally With Public and Government-Adjacent Data

A comprehensive metric-by-metric reproduction plan mapping every Houston CBAS presentation metric to public data sources, with national replicability assessments, modeling strategies, and SQL implementation artifacts.

← Back to research index

Table of Contents

  1. Executive Summary and Scope
  2. Houston Presentation Metric Inventory
  3. Public and Government-Adjacent Sources
  4. Metric-to-Source Mapping and National Replicability Assessment
  5. Inferred Houston-Specific Sources and Evidence
  6. Practical Automation Design for Local Permitting and Recorder Ingestion
  7. Modeling Approach to Estimate Starts, UC, Completions, and Finished Vacant Inventory
  8. National Rollout Plan, Top-25 Placeholders, and Implementation Artifacts
  9. Reference Implementation: SQL DDL and Sample Queries
  10. Closing Synthesis

Executive Summary and Scope

The Houston presentation you referenced (and the uploaded PDF dated 02/12/2026) is best understood as a hybrid of: (a) federal macro indicators that are easy to automate nationwide, (b) local government process data (permits, plats, inspections, certificates, recordings) that is automatable but fragmented and vendor-dependent, and (c) proprietary "new-home market intelligence" (community-level starts/closings, spec inventory, floorplan/base-price catalogs, vacant developed lots, future-lots pipelines) that is difficult to replicate purely from public sources without building a sophisticated local-data fusion + modeling layer.

A national rollout across "top-25 CBSAs" (unspecified; placeholders provided later) is feasible if you design the product as a layered stack:

Important scope note: the Houston deck's new-home and lot pipeline metrics appear to have been produced by Community Builders Advisory Services ("Source: CBAS" appears repeatedly in the deck), implying an internal address/community database and field-verified pipeline. A public-data-only implementation will inevitably be more probabilistic unless you integrate deep local permitting/inspection/plat/parcel/recording data at address level.

Houston Presentation Metric Inventory

This section itemizes every metric/chart/table in the Houston presentation, focusing on: metric name, interpretation/definition, units, time window, and visual type. Where the deck does not define the metric precisely, it is flagged as ambiguous (per your instruction).

Catalog of Metrics and Visuals in the Houston Deck

Full metric inventory table (51 rows — click to expand)
Section in Deck Slide Element (Metric/Table/Chart Name) Definition as Shown or Inferred Units Time Window Shown Visualization Type
Demographics Total Population Change Net population change (not decomposed on this slide) People 2001–2023 Line chart + callouts
Demographics Metro population growth ranking table Rank + metro population + numeric/population change People 2023–2024 (vintage implied) Table (top metros)
Demographics County population growth ranking table Rank + county population + numeric change + % change People, % 2023–2024 (vintage implied) Table (top counties)
Demographics Births–Deaths, Domestic Migration, International Migration Components of population change People 2020–2024 Multi-series chart
Demographics "Metro Houston Population by Year" Population level People 2015–2024 Line chart
Employment Employment growth trend ranking table Total employment, annual job growth (# and %), unemployment rate Jobs, %, % "as of" ~Sep 2025 (implied by slide) Table (top metros)
Employment Houston annual job growth Annual job growth as a time series Jobs (or change) Jan 2024–Sep 2025 Line chart
Employment Total nonfarm payroll employment (seasonally adjusted) Employment level Jobs 2003–2025 Line chart
Employment "Metro Houston Employment – Select Industries" Employment by industry Jobs A point-in-time (implied) Table
Employment Annual employment growth by sector Sector contribution/growth Jobs change ~latest year Bar chart
Employment 2026 employment growth forecast Forecasted job gains/losses by sector + total forecast Jobs change 2026 Bar chart
Resale market "Single Family True Resale Home Sales – MLS Stats" Annual sales, monthly sales, average price, median price, active listings, days on market, sales/list price, and YoY % changes Counts, $, days, % December 2025 snapshot (annual + monthly) Table (with callouts)
Resale market "Resale Home Closings and Inventory" Monthly closings and active listing inventory Counts Jan 2018–Nov 2025 Dual-axis line chart
New home market Houston starts and closings history Ambiguous definition of "estimated starts" and "estimated closings" (likely for-sale single-family new construction) Homes 4Q 2000/2002–3Q 2025 Dual-series chart (rolling annual by quarter)
New home market "3Q 2025 New Home Market by the Numbers" Quarterly starts; quarterly closings; homes under construction; finished vacant homes inventory; finished vacant months' supply Homes, months 3Q 2025 KPI tiles
Lots / pipeline "3Q 2025 New Home Market by the Number" (lots) Vacant developed lots; VDL months' supply; future lots; future lots w/ active site work Lots, months 3Q 2025 KPI tiles
New home segmentation "Annual starts and closings by market area" Starts and closings split by sub-areas (North/West/etc.) Homes "Annual" at 3Q 2025 (likely trailing 4 quarters) Bar chart + map
New home segmentation "Annual starts and closings by lot size program" Starts and closings split by lot width/depth bands Homes Annual at 3Q 2025 Bar chart
New home segmentation High volume subdivisions (starts) Count of subdivisions with 50+ annual starts Subdivisions Annual at 3Q 2025 Map + count callout
New home segmentation High volume subdivisions (closings) Count of subdivisions with 50+ annual closings Subdivisions Annual at 3Q 2025 Map + count callout
New home supply Finished vacant homes by market area Finished vacant inventory by area + months' supply Homes, months 3Q 2025 Bar + line
New home supply Submarkets with high finished vacant inventory Finished vacant homes and months' supply for selected submarkets; includes selection criteria text Homes, months 3Q 2025 Table
New home supply Finished vacant inventory by lot size program Finished vacant inventory + months' supply by lot-size band Homes, months 3Q 2025 Bar + line
Pricing Median new home price trend + price per square foot Median $ and $/sf $, $/sf Through Dec 2025 (multi-year) Dual-axis line chart
Pricing Floorplan price direction Count/% of floorplans with price decreases/no change/increases; median magnitude of change Count, %, $ QoQ (quarter-over-quarter) KPI/stat block
Pricing Annual starts/closings by base price band Starts and closings by base price buckets Homes Annual at 3Q 2025 Bar chart
Pricing Finished vacant inventory by price band Finished vacant inventory + months' supply by price bucket Homes, months 3Q 2025 Bar + line
Floorplans Least expensive base priced floor plan Specific plan: base price, beds/baths, sqft; community + builder $, count, sqft Snapshot Profile card
Floorplans Most expensive base priced floor plan(s) Plan specs and base price $, count, sqft Snapshot Profile card(s)
Rankings Top communities & neighborhoods ranked by annual starts Rank + annual starts + annual closings by community Homes Annual at 3Q 2025 Table (spans multiple slides)
Lot inventory VDL inventory by market area + months' supply Vacant developed lots by market area + months' supply Lots, months 3Q 2025 Bar + line
Lot inventory Submarkets with high VDL inventory VDL inventory and months' supply by submarket + selection criteria Lots, months 3Q 2025 Table
Lot inventory High VDL subdivisions Count of subdivisions with 100+ VDL Subdivisions 3Q 2025 Map + count callout
Lot inventory VDL by lot size program VDL inventory + months' supply by lot-size band Lots, months 3Q 2025 Bar + line
Future lots Future lots by market area Lots in future pipeline split by "raw land" vs "active site work" (labels inferred from legend) Lots 3Q 2025 Stacked bar chart
Future lots Future lots by status Lots by status buckets (Vacant, Clearing, WS&D, Paving) Lots 3Q 2025 Bar chart
Future lots Future planned subdivision locations Count of identified future planned subdivisions Subdivisions 3Q 2025 Map + count callout
Focus area 288 South corridor "Historical Community Activity" Quarterly closings; models; complete vacant; under construction; total inventory; total supply; quarterly starts; VDL; VDL supply; future lots; lot deliveries Homes/lots, months 1Q 2025–4Q 2025 Table
Focus area 288 South "Top Communities" table For each community: 4Q models; quarterly starts & annual; quarterly closings & annual Homes, models 1Q 2025–4Q 2025 + annual Table
Focus area 288 South "Top Builders" table Annual starts, annual closings, 4Q market share, annual market share Homes, % Annual at 4Q 2025 Table
Focus area 288 South starts/closings by lot size Starts and closings by lot-size band Homes Annual at 4Q 2025 Bar chart
Focus area 288 South starts/closings by base price band Starts and closings by price bucket Homes Annual at 4Q 2025 Bar chart
Focus area 288 South VDL by lot size VDL inventory + months' supply by lot size Lots, months 4Q 2025 Bar + line
Focus area 288 South future planned developments "Future planned lots identified" Lots 4Q 2025 KPI callout
Focus area Grand Magnolia (map + community table) Map of submarkets + community table as above Homes, models 1Q 2025–4Q 2025 + annual Map + table
Focus area River Ranch (map + community table) Map of submarkets + community table as above Homes, models 1Q 2025–4Q 2025 + annual Map + table
Focus area Lago Mar East (map + community table) Map of submarkets + community table as above Homes, models 1Q 2025–4Q 2025 + annual Map + table
Conclusions Houston starts forecast ranges 2025 and 2026 forecast start ranges and % deltas Homes, % 2025–2026 KPI callout
Conclusions Permits vs estimated starts history + long-term average Annual SF building permits vs estimated annual starts; includes long-term average line Homes 4Q 2000–4Q 2026 Dual-axis bar+line

Ambiguities in the Houston Deck to Treat as "Unspecified"

The following are central to reproducing the deck but not formally defined on-slide, so a national build should treat them as metric-spec decisions you must codify:

These are solvable, but they materially affect replication success.

Public and Government-Adjacent Sources to Power Houston-Style Metrics

This section is a catalog of the most relevant nationwide public sources, with endpoints/patterns, cadence, granularity, and automation difficulty.

Federal Datasets and APIs

Building Permits

The U.S. Census Bureau Building Permits Survey (BPS) is the most important uniform national feed for local-market construction activity. Revised permits are released on the 17th workday of each month and are published down to CBSA, county, and permit-issuing place.

BPS bulk files are distributed through Census "FTP-style" directories (public HTTP). The directory structure includes CBSA, County, Place, State, and a large "Master Data Set."

Concrete endpoint patterns (BPS):

# Directory landing (browseable)
https://www2.census.gov/econ/bps/

# CBSA revised monthly & year-to-date files (text)
https://www2.census.gov/econ/bps/CBSA%20%28beginning%20Jan%202024%29/cbsaYYMMc.txt
https://www2.census.gov/econ/bps/CBSA%20%28beginning%20Jan%202024%29/cbsaYYMMy.txt

# Example shown in directory listing
.../cbsa2512c.txt  (revised monthly, Dec 2025)
.../cbsa2512y.txt  (YTD, Dec 2025)

# Master compiled data documentation (notes it is extremely large)
https://www2.census.gov/econ/bps/Master%20Data%20Set/Compiled%20Data%20Documentation.docx

The Master Compiled Data Set is described as extremely large (millions of rows / multi-GB) and is usually unnecessary if you only need top CBSAs; it's typically easier to ingest the monthly CBSA files plus county/place as needed.

Automation difficulty: 1/5 (bulk file ingest; stable)
Time to automate ingestion: ~1–2 engineer-days for a robust downloader/parser + 2–3 days for QA and schema stabilization.

National Starts, Under Construction, Completions, and New-Home Inventory Stages (Calibration, Not Metro)

The Census "New Residential Construction" (NRC) and "New Residential Sales" (NRS) releases (from the Survey of Construction and BPS) provide national and regional estimates for: starts, under construction, completions, and stage-of-construction inventory, including "completed houses for sale."

These are released on the Survey of Construction schedule: NRC and NRS typically on the 12th workday, and revised permits on the 17th workday.

Key constraint: NRC/NRS are not published at CBSA granularity for the "under construction / completions / stage inventory" measures, so they function mainly as macro calibration targets for any metro estimation model.

Automation difficulty: 1/5 (direct Excel downloads)
Time to automate ingestion: <1 week including field dictionary mapping.

Mortgage-Backed "Closings" Proxy (HMDA)

HMDA is maintained/published through the FFIEC HMDA platform (with CFPB stewardship). The dataset is a powerful proxy for mortgage-financed purchase originations and can be summarized by MSA/MD (metro). Federal Financial Institutions Examination Council provides a "Data Browser API" that returns either aggregated JSON or raw CSV subsets, filtered by filing year and geography.

Key endpoints from the official documentation include:

# Aggregations (JSON)
GET https://ffiec.cfpb.gov/v2/data-browser-api/view/aggregations?years=YYYY&msamds=#####&actions_taken=...

# Raw streamed CSV (careful: can be huge)
GET https://ffiec.cfpb.gov/v2/data-browser-api/view/csv?years=YYYY&msamds=#####&actions_taken=...

The documentation specifies required parameters (year + at least one HMDA data filter) and geographic filters including msamds and counties.

For bulk files, "HMDA File Serving" documents institution-level modified LAR endpoints (CSV/TXT per LEI) and describes that other files are served from a public bucket with a fixed prefix.

Automation difficulty: 3/5

Time to automate ingestion: ~2–4 weeks for a production-grade pipeline (query builder, paging/streaming, retries, audit logs, and a stable metric layer).

Prices: FHFA HPI

The Federal Housing Finance Agency publishes a "master" House Price Index file as direct CSV; it contains CBSA-level series (among many levels).

Concrete endpoint (FHFA HPI):

https://www.fhfa.gov/hpi/download/monthly/hpi_master.csv

Automation difficulty: 1/5
Time to automate ingestion: ~2–4 engineer-days including a CBSA filter/extract and QA.

"Macro Convenience" Replication via FRED

Many of the BPS permit series and other macro series are mirrored in FRED. FRED is particularly useful when you want simple CBSA series IDs and consistent format options, and you're comfortable depending on FRED as a distributor.

Example: Houston 1-unit permits series includes a definition that 1-unit structures correspond to single-family homes (including certain attached forms if separated by ground-to-roof walls).

FRED Web Services provide a stable API with a required API key parameter and support JSON/CSV output via file_type.

Automation difficulty: 1/5
Time to automate ingestion: <1 week.

Population Levels and Migration Components

The Census Population Estimates Program publishes metro/micro population totals and components of change (including natural change and net domestic/international migration components) in downloadable files for 2020–2024 vintages.

This is a direct match for the deck's "births–deaths / domestic migration / international migration" visuals at the metro level.

Automation difficulty: 2/5 (file layout + annual refresh)
Time to automate ingestion: ~1–2 weeks including crosswalk stabilization across vintages.

HUD Distribution Layers and ArcGIS Endpoints

HUD has two major "distribution surfaces" relevant here:

HUD–USPS Vacancy / No-Stat Data (Restricted Access)

HUD's aggregated USPS administrative data is extremely relevant to your "vacancy / under construction" problem because it includes quarterly counts and defines "No-Stat" as including addresses like homes under construction and not yet occupied.

However, HUD states that under its agreement with USPS it can make the data accessible only to governmental entities and registered nonprofits, and access requires registration.

HUD also describes a newer "Neighborhood Change Web Map API" and a HUD User dataset API tester that requires an access token; again the access model is restricted.

Automation difficulty: 5/5 (because access eligibility and licensing is the gating factor)
Time to automate ingestion: Engineering time ~2–4 weeks once access is granted, but "time to access" is organizational/legal and may dominate.

Local/County/City Permitting and Open-Data Systems

This is where national replication becomes "connector-driven." You should assume each CBSA will require a portfolio of permitting sources (city + county + special districts) rather than one.

Open-Data Portals

Two common patterns:

Permitting Workflow Platforms

Houston Examples Visible in Public Web

Metric-to-Source Mapping and National Replicability Assessment

This section maps each Houston metric family to the most plausible public sources, plus an explicit replicability assessment (difficulty 1–5). Where a metric is likely CBAS-proprietary, the mapping focuses on best-effort public substitutes and modeling strategies.

Summary: What You Can Replicate "Cleanly" Nationwide vs What Becomes Probabilistic

Metric Family Examples from Houston Deck Best Public Source(s) National Coverage Difficulty (1–5) Notes
Permits issued "Annual SF building permits," implicit in starts forecast Census BPS (CBSA/county/place); FRED permit series Excellent 1 BPS is your backbone.
Population + migration Components of change, population levels Census metro population estimates + components files Excellent 2 Annual refresh; stable file layouts.
Employment + unemployment Metro tables and time series BLS time series (LAUS/CES); BLS public API Excellent 2 Requires series-id management and caching.
House price index Price trend proxy for resale/new home FHFA HPI (CBSA series); optionally Freddie Mac FMHPI Strong 1–2 FHFA is fully public and downloadable.
Mortgage-financed sales/closings Closings proxy HMDA Data Browser API aggregations by msamds Good 3 Mortgage-only; does not cover cash.
Total sales/closings (incl cash) Resale closings counts County recorder deed transfers (varies), assessor sales files (varies) Fragmented 4–5 Many portals are paywalled or scraping-only; normalization hard.
Starts/UC/completions at metro Quarterly starts, UC stock, completions Requires local permit + inspection + CO event fusion; calibrate to NRC/SOC Fragmented 4 NRC/SOC is national/regional only; use it for calibration.
Finished vacant new-home inventory "Finished Vacant Homes in Inventory" Best-effort: CO/completion minus recorded/financed sales; alternative: HUD-USPS no-stat/vacancy (restricted) Fragmented 4–5 "Finished vacant" is conceptually computable if you can unify CO and sales, but it's data-work heavy.
VDL + future lots pipeline VDL supply, future lots, "active site work" Plats + land-development permits + parcel subdivision buildout heuristics Highly fragmented 5 Closest public analogs are plat agendas/approvals and infrastructure permits, but definitions differ.
MLS stats Active listings, DOM, sale/list ratio Requires MLS (proprietary) Not public 5 Public proxies exist but are not government-adjacent; your best "adjacent" route is recorder+permit+HMDA.

Per-Metric Reproduction Recipes

Below are the key Houston metrics you called out (permits, quarterly starts, UC, completions/closings, finished vacant inventory), each with a reproduction plan.

Permits Issued (metro and sub-areas)

Source(s): Census BPS revised CBSA and county/place files.
Cadence: monthly revised (17th workday).
Granularity: CBSA/county/place; by unit type and structure where provided (depends on file).
Difficulty: 1/5 nationally.

Automation plan (robust + incremental):

  1. Scrape or hardcode file naming convention cbsaYYMMc.txt and cbsaYYMMy.txt from the Census directory listing.
  2. Download newest month on a schedule aligned to "17th workday."
  3. Parse as delimited text; store raw with (source_file, ingest_timestamp, row_hash) for audit.
  4. Normalize into fact_permits(cbsa, month, unit_type, units_authorized, ...).
  5. For "top-25 CBSAs," filter on CBSA codes; optionally keep full file for historical comparability.
Quarterly Starts (Houston "Quarterly Starts")

Public-source reality check: There is no uniform federal CBSA "starts" series comparable to NRC/SOC starts; NRC is national/regional.

Therefore: You either (A) treat permits as an approximation of starts, or (B) build a local lifecycle model from permits + inspections.

Option A (fast, permits-as-starts):

  • Define starts_qtr = sum(bps_permits_1unit) by quarter and CBSA.
  • Calibrate using national ratio (starts/permits) from NRC to adjust.

Difficulty: 2/5 (simple, but conceptually imperfect).

Option B (Houston-style, lifecycle-derived starts):

  • Ingest jurisdiction-level permits + inspection event logs.
  • Define "start date" as first foundation/slab inspection pass or first inspection after permit issuance (your metric spec decision).
  • Aggregate to CBSA-quarter.

Difficulty: 4/5 (requires local connectors).

Homes Under Construction (stock at quarter end)

Option A (modeled stock from starts):

  • Treat UC stock as a pipeline inventory derived from starts convolved with a time-to-completion distribution.
  • Calibrate the distribution using national NRC "under construction" vs starts/completions relationships.

Difficulty: 3/5 (no local data, but statistical).

Option B (observed stock from local lifecycle):

  • UC stock at quarter end = count of addresses with start_date ≤ quarter_end and completion_date > quarter_end (or null).
  • completion_date derived from CO or final inspection.

Difficulty: 4/5.

Closings / Sales

Mortgage-financed closings proxy (nationally uniform):

HMDA "originations" filtered to purchase loans in a CBSA gives a mortgage closing proxy (not total). Use the Data Browser aggregation endpoint by msamds and relevant filters.

Difficulty: 3/5.

Total closings (mortgage + cash):

  • Recorder deeds ("Warranty Deed," "Deed," etc.) provide transfer events, but each county differs and many portals are not API-first and may require accounts/fees.

Difficulty: 5/5.

Hybrid "cash closings" estimate:

  • Compute: cash_proxy = recorder_sales - hmda_purchase_originations (after aligning geographies and time windows).
  • Expect mismatch from refis, investor conveyances, delayed recordings, and non-arm's-length transfers; you need filtering rules.
Finished Vacant New-Home Inventory

There is no federal CBSA "finished vacant new homes" statistic. The closest public analogs are national stage-of-construction series and, potentially, USPS vacancy/no-stat signals (but restricted).

Lifecycle-based computation (closest to the Houston KPI concept):

  1. Identify new-home completions (CO issued or final inspection pass).
  2. Identify "sold/closed" (deed transfer date and/or HMDA purchase origination date).
  3. Finished vacant at quarter end = completed ≤ quarter_end AND not sold by quarter_end (excluding model homes if you tag them).

Difficulty: 4–5/5 depending on recorder availability.

USPS-based proxy (if eligible for HUD–USPS access):

  • Use "No-Stat residential addresses" to approximate "under construction/not yet occupied," and vacancy counts to approximate vacancy dynamics. HUD explicitly notes "No-Stat" can include homes under construction and not yet occupied.
  • Aggregate tract-level data to CBSA using crosswalks.

Difficulty: 5/5 (access-limited).

Inferred Houston-Specific Sources and Evidence

This section addresses: "Which local Houston sources or vendor systems did the presenter likely use for each non-federal metric, and why?"

MLS Resale Metrics

The deck explicitly cites "HAR.com" as the source for the resale table and the resale closings/inventory time series. This strongly indicates the presenter used data from Houston Association of Realtors (HAR MLS data distribution), which is proprietary and not a government dataset.

Implication for national rollout: you cannot reproduce "Active Listings," "Days on Market," or "Sales/List Price" purely from government sources; you need MLS partnerships or accept alternative proxies.

New-Home Starts, Closings, Inventory, VDL, and Future Lots

Across the new-home and lot pipeline slides, the deck repeatedly cites "Source: CBAS," and many visuals are at community/subdivision granularity rather than jurisdiction granularity. That pattern is consistent with a proprietary internal "new-home market census" (address/community-level tracking). The data elements that are especially indicative of private tracking rather than a single public feed include:

Public-data inference: CBAS likely used a blend of (1) permit data and inspection milestones from local permitting agencies, (2) parcel/subdivision geometry from county appraisal/GIS sources, and (3) field validation / builder portal scraping for floorplans and prices.

Houston-area publicly visible building-process systems that could contribute include:

Closings and Transfers

If CBAS produced new-home "closings," they likely relied on either: (a) builder-reported closings, (b) deed recording data, or (c) MLS new construction closings (if covered). Public recorder systems exist in the Houston region, but they commonly have account requirements and transaction frictions. Example: Harris County Clerk real property records portal emphasizes account creation and paid copies.

For national replication, this is one of the hardest domains to automate cleanly.

Practical Automation Design for Local Permitting and Recorder Ingestion

This section gives you the "connector playbook" for city/county permitting, inspection, platting, and recording systems, including typical API patterns, auth/rate limits, and normalization.

Connector Patterns You Should Build

Socrata Connector

Typical query pattern:

GET https://{domain}/resource/{dataset_id}.json?$select=...&$where=...&$order=...&$limit=1000&$offset=...
Header: X-App-Token: <token>

ArcGIS FeatureServer Connector

Typical query pattern:

GET https://{host}/ArcGIS/rest/services/{svc}/FeatureServer/{layer}/query
  ?where=1%3D1
  &outFields=*
  &f=json
  &resultRecordCount=2000
  &resultOffset=0

Accela (When You Can Obtain API Credentials)

Accela publishes an API with explicit offset/limit pagination and rate limit headers (x-ratelimit-*). This is a "best case" compared with scraping public portals.

Tyler / EnerGov and OpenGov

Tyler describes an API toolkit for permits and code enforcement, implying access to building permits and inspections data programmatically. OpenGov provides a developer portal and Permitting & Licensing API catalog access, again typically gated.

For both, your connector strategy should assume:

County Recorder / Assessor Ingestion

Recorder portals vary widely, often require user accounts, and may not have stable JSON APIs. Harris County Clerk's real property records portal emphasizes portal login/account and copy purchasing. Texas also has county-aggregating portals (e.g., Tyler-hosted "countygovernmentrecords" experience), indicating vendor concentration but not necessarily a public API.

Practical automation stance:

Recommended Normalization Schema (BLDS-Inspired Plus Lifecycle)

The Building & Land Development Specification (BLDS) exists specifically for standardizing building permit open data. It is a useful reference point for your schema layer.

A Houston-style dashboard, however, needs more than permit issuance: it needs a unit lifecycle. Recommended canonical entities:

Modeling Approach to Estimate Starts, Under Construction, Completions, and Finished Vacant Inventory

This section gives detailed practical methods when direct measurements are not uniformly available.

Method 1: "Federal-First" Estimation (Fast, Uniform, Approximate)

Inputs:

  • CBSA single-family permits from BPS or FRED.
  • National/regional starts/UC/completions from NRC/SOC for calibration of pipeline dynamics.

Core idea:

  • Starts are modeled as a function of local permits and a calibration factor derived from NRC.
  • Under-construction stock is computed as a convolution of starts with a time-to-completion distribution.
  • Completions are the outflow of that pipeline.

Pros: quick to nationalize; stable inputs.
Cons: cannot reproduce community-level tables; finished vacant inventory is weakly identified.

Method 2: "Lifecycle-Fusion" Estimation (Closest to Houston Deck)

Inputs per jurisdiction:

  • Permit records with address/parcel, issue date, type, units.
  • Inspection events (foundation, framing, final).
  • CO / final pass date.
  • Recorder deed transfer events (or assessor sales).
  • HMDA mortgage-financed purchase originations (metro proxy).

Definitions (suggested defaults; you can change):

  • Start = first foundation/slab inspection pass date (fallback: permit issue date + lag).
  • Completion = CO issue date (fallback: final inspection pass date).
  • Closing = deed recordation date (fallback: HMDA origination date for financed purchases).
  • Finished vacant inventory = completed but not closed, excluding model homes.

Calibration and QA:

  • Aggregate your derived "starts" to CBSA month/quarter and compare trends to BPS permits.
  • Reconcile macro totals to NRC/SOC national/regional shares to detect drift.

Method 3: USPS Vacancy/No-Stat Proxy (Restricted, but Powerful if Eligible)

If you can meet HUD/USPS access requirements, tract-level quarterly "No-Stat" and vacancy counts can serve as an additional signal for "under construction" addresses, because HUD notes "No-Stat" includes homes under construction not yet occupied.

This method is most useful as a sanity-check layer rather than a sole estimator.

National Rollout Plan, Top-25 Placeholders, and Implementation Artifacts

Top-25 CBSAs Placeholder List

You requested that the exact top-25 list be treated as unspecified. Use placeholders until you define whether "top" means population, new-home starts, transaction volume, or another criterion:

<CBSA_01>, <CBSA_02>, …, <CBSA_25>

Prioritized Connector Roadmap (Actionable)

Connector Scope What It Unlocks Est. Effort (weeks) Risk Notes
Census BPS ingest Federal baseline Permits (CBSA/county/place) 1–2 Low risk; stable files.
FRED ingest Federal convenience Permits & macro series by CBSA 1 Requires API key.
FHFA HPI ingest Pricing trend CBSA HPI series 1 Simple CSV.
Census PEP metro ingest Pop + migration Population and components 2 Annual refresh; vintage changes.
BLS API ingest Labor market Employment/unemployment series 2–3 Series-id management + rate limits.
HMDA Data Browser API Mortgage closings proxy Purchase originations counts/sums by metro 3–5 Large payloads; query constraints.
ArcGIS FeatureServer connector Local open data Permits/inspections/CO/plats where exposed 3–6 Endpoint variability; pagination quirks.
Socrata connector Local open data Same as above 2–4 Dataset discovery and churn.
Accela API connector Permitting workflow Permit + inspection lifecycle where credentialed 4–8 Requires credentials/agency cooperation.
Recorder/assessor connector patterns Closings (cash+financed) Total transfer counts and prices 6–12 Most difficult; legal + paywalls + normalization.
HUD–USPS vacancy (if eligible) Vacancy/under-construction proxies Tract-level vacancy/no-stat trend 4–8 Access eligibility dominates.

Rollout Prioritization Heuristic for CBSAs

A practical heuristic is to prioritize CBSAs where you can achieve a "minimum viable Houston" with high automation:

  1. Permits coverage: large share of new construction occurs in jurisdictions with open-data exports or ArcGIS/Socrata.
  2. Recorder accessibility: county recorder exposes bulk exports or modern searchable systems with predictable patterns.
  3. Parcel layer availability: county appraisal district provides GIS downloads and parcel identifiers, enabling address-to-parcel joins. (Houston example: HCAD provides quarterly GIS downloads.)
  4. Complexity: fewer jurisdictions dominate the CBSA's construction volume.

CBSA Data Availability Card Template (Prefilled with Houston Examples)

CBSA: <CBSA_NAME>
Counties (dominant): <COUNTY_LIST>
Primary cities/jurisdictions for permits: <CITY_LIST>
Permitting system(s) observed: <Accela / EnerGov(Tyler) / OpenGov / custom / unknown>
Open-data portals present: <ArcGIS Hub / Socrata / none / unknown>
Recorder access pattern: <bulk export / searchable portal / paywall / unknown>
Parcel / assessor layer: <GIS downloads available? yes/no>

Key metrics support:

  • Permits: <yes> (BPS CBSA files; baseline)
  • Starts/UC/Completions: <modeled / lifecycle>
  • Closings: <HMDA-only / recorder>
  • Finished vacant inventory: <derived / not supported>
  • VDL/future lots: <modeled from plats / not supported>

Houston example notes (from public sources):

  • City permit portal exists for City of Houston.
  • Harris County ePermits system exists.
  • City planning provides plat activity reports in Excel (entitlement pipeline signal).
  • HCAD provides GIS downloads and public property data downloads (parcel base).
  • Harris County Clerk portal provides real property record search and copy purchasing (recording friction).

Data Flow and ETL Diagram (Mermaid)

flowchart LR
  subgraph Federal[Federal baseline]
    BPS[Census BPS permits files]
    PEP[Census metro pop + components]
    BLS[BLS employment/unemployment]
    FHFA[FHFA HPI]
    HMDA[FFIEC HMDA Data Browser API]
    FRED[FRED series API]
  end

  subgraph Local[Local government-adjacent]
    PERMITS[City/County permit systems]
    INSP[Inspections + CO events]
    PLATS[Plats / entitlement agendas]
    PARCELS[Assessor / parcel GIS]
    REC[Recorder deeds / transfers]
  end

  subgraph ETL[Ingestion + normalization]
    RAW[(Raw landing tables)]
    STG[(Staging/standardization)]
    CORE[(Canonical warehouse)]
    METRICS[(Metric marts)]
  end

  subgraph Outputs[Products]
    DASH[CBSA dashboards]
    API[Public/internal metrics API]
  end

  BPS-->RAW
  PEP-->RAW
  BLS-->RAW
  FHFA-->RAW
  HMDA-->RAW
  FRED-->RAW

  PERMITS-->RAW
  INSP-->RAW
  PLATS-->RAW
  PARCELS-->RAW
  REC-->RAW

  RAW-->STG-->CORE-->METRICS
  METRICS-->DASH
  METRICS-->API

Release Timeline Gantt (Conceptual, Based on Official Schedules)

gantt
  title Typical monthly release cadence (conceptual)
  dateFormat  YYYY-MM-DD
  axisFormat  %d

  section Census/SOC releases
  New Residential Construction (12th workday) :a1, 2026-03-12, 1d
  New Residential Sales (12th workday)        :a2, 2026-03-12, 1d
  Revised Building Permits (17th workday)     :a3, 2026-03-19, 1d

  section Other feeds
  FHFA HPI (monthly, lagged)                  :b1, 2026-03-28, 1d
  BLS metro series (monthly cycles vary)      :b2, 2026-03-15, 2d
  HMDA (annual publication window varies)     :b3, 2026-04-01, 30d

The 12th/17th workday schedule for NRC/NRS and revised permits is documented on Census SOC and BPS schedules. HMDA publication cadence is annual and not "monthly schedule-based"; public announcements confirm annual availability for a filing year.

Reference Implementation: SQL DDL and Sample Queries

Below are exact SQL DDL examples (PostgreSQL-style) for a Houston-like pipeline, and sample queries to compute Q4 starts/completions/under-construction/finished-vacant metrics.

Canonical Tables

Full SQL DDL for canonical tables (click to expand)
-- Jurisdictions (cities, counties, agencies)
CREATE TABLE dim_jurisdiction (
  jurisdiction_id      BIGSERIAL PRIMARY KEY,
  name                 TEXT NOT NULL,
  jurisdiction_type    TEXT NOT NULL,   -- city, county, agency, special_district
  state_fips           TEXT,
  county_fips          TEXT,
  source_system        TEXT,            -- accela, energov, opengov, socrata, arcgis, custom
  source_base_url      TEXT,
  created_at           TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Core CBSA dimension (your own, seeded from Census delineation files)
CREATE TABLE dim_cbsa (
  cbsa_code            TEXT PRIMARY KEY, -- e.g., '26420'
  cbsa_name            TEXT NOT NULL,
  delineation_year     INT NOT NULL,     -- e.g., 2023
  is_metro             BOOLEAN NOT NULL
);

-- Address/parcel entity (normalized)
CREATE TABLE dim_property (
  property_id          BIGSERIAL PRIMARY KEY,
  address_full         TEXT,
  address_norm         TEXT,             -- normalized (USPS style)
  city                 TEXT,
  state                TEXT,
  postal_code          TEXT,
  parcel_id_raw        TEXT,
  latitude             NUMERIC(10,7),
  longitude            NUMERIC(10,7),
  county_fips          TEXT,
  cbsa_code            TEXT REFERENCES dim_cbsa(cbsa_code),
  created_at           TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Permit header (raw ingest may be separate; this is standardized)
CREATE TABLE fact_permit (
  permit_id            BIGSERIAL PRIMARY KEY,
  jurisdiction_id      BIGINT REFERENCES dim_jurisdiction(jurisdiction_id),
  permit_number        TEXT NOT NULL,
  permit_type          TEXT,            -- building, electrical, plumbing, etc.
  work_class           TEXT,            -- new, addition, alteration, demo, etc
  residential_flag     BOOLEAN,
  single_family_flag   BOOLEAN,
  units_authorized     INT,
  declared_value_usd   NUMERIC(14,2),
  application_date     DATE,
  issue_date           DATE,
  status               TEXT,
  source_record_id     TEXT,            -- vendor/system id
  property_id          BIGINT REFERENCES dim_property(property_id),
  updated_at           TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE(jurisdiction_id, permit_number)
);

-- Inspection/event log (foundation, framing, final, CO, etc)
CREATE TABLE fact_inspection_event (
  inspection_event_id  BIGSERIAL PRIMARY KEY,
  jurisdiction_id      BIGINT REFERENCES dim_jurisdiction(jurisdiction_id),
  permit_number        TEXT NOT NULL,
  event_type           TEXT NOT NULL,   -- foundation_pass, framing_pass, final_pass, co_issued, etc
  event_status         TEXT,            -- pass/fail/issued/scheduled
  event_date           DATE NOT NULL,
  source_record_id     TEXT,
  property_id          BIGINT REFERENCES dim_property(property_id),
  created_at           TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Recorder transfers (deeds / conveyances)
CREATE TABLE fact_property_transfer (
  transfer_id          BIGSERIAL PRIMARY KEY,
  county_fips          TEXT NOT NULL,
  instrument_number    TEXT,
  document_type        TEXT,            -- warranty deed, deed, etc
  record_date          DATE NOT NULL,    -- recording date
  sale_date            DATE,             -- if available; else null
  sale_price_usd       NUMERIC(14,2),    -- if available
  grantor              TEXT,
  grantee              TEXT,
  property_id          BIGINT REFERENCES dim_property(property_id),
  created_at           TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Derived unit lifecycle table (one row per "housing unit observation key")
-- If you track at address+unit granularity, add unit_number or a unit_id.
CREATE TABLE fact_unit_lifecycle (
  unit_id              BIGSERIAL PRIMARY KEY,
  property_id          BIGINT REFERENCES dim_property(property_id),
  cbsa_code            TEXT REFERENCES dim_cbsa(cbsa_code),

  start_date           DATE,             -- your chosen start definition
  completion_date      DATE,             -- CO or final pass
  closing_date         DATE,             -- deed record date or best proxy

  is_model_home         BOOLEAN DEFAULT FALSE,
  source_confidence     NUMERIC(3,2) DEFAULT 0.70, -- 0..1
  created_at            TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Optional: monthly permits mart (BPS)
CREATE TABLE fact_cbsa_permits_bps (
  cbsa_code            TEXT REFERENCES dim_cbsa(cbsa_code),
  ym                  DATE NOT NULL,     -- first day of month
  units_1unit          INT,
  units_total          INT,
  updated_at           TIMESTAMPTZ NOT NULL DEFAULT now(),
  PRIMARY KEY (cbsa_code, ym)
);

Sample Queries for Q4 Metrics

The queries below assume you have derived start_date, completion_date, and closing_date in fact_unit_lifecycle.

Q4 Starts and Completions (by CBSA)

-- Parameterize these in your application
-- Example for Q4 2025:
WITH params AS (
  SELECT DATE '2025-10-01' AS q_start,
         DATE '2025-12-31' AS q_end
)
SELECT
  ul.cbsa_code,
  COUNT(*) FILTER (WHERE ul.start_date BETWEEN p.q_start AND p.q_end)      AS q4_starts,
  COUNT(*) FILTER (WHERE ul.completion_date BETWEEN p.q_start AND p.q_end) AS q4_completions,
  COUNT(*) FILTER (WHERE ul.closing_date BETWEEN p.q_start AND p.q_end)    AS q4_closings
FROM fact_unit_lifecycle ul
CROSS JOIN params p
GROUP BY ul.cbsa_code
ORDER BY ul.cbsa_code;

Under-Construction Stock at Quarter End

WITH params AS (
  SELECT DATE '2025-12-31' AS q_end
)
SELECT
  ul.cbsa_code,
  COUNT(*) AS under_construction_stock
FROM fact_unit_lifecycle ul
CROSS JOIN params p
WHERE ul.start_date IS NOT NULL
  AND ul.start_date <= p.q_end
  AND (ul.completion_date IS NULL OR ul.completion_date > p.q_end)
GROUP BY ul.cbsa_code;

Finished Vacant Inventory at Quarter End

WITH params AS (
  SELECT DATE '2025-12-31' AS q_end
)
SELECT
  ul.cbsa_code,
  COUNT(*) AS finished_vacant_inventory
FROM fact_unit_lifecycle ul
CROSS JOIN params p
WHERE ul.completion_date IS NOT NULL
  AND ul.completion_date <= p.q_end
  AND (ul.closing_date IS NULL OR ul.closing_date > p.q_end)
  AND COALESCE(ul.is_model_home, FALSE) = FALSE
GROUP BY ul.cbsa_code;

Months' Supply of Finished Vacant Inventory (Example Spec)

A common definition is: months supply = (finished vacant inventory) / (average monthly closings over trailing 3 months).

months_supply = finished_vacant_inventory / (closings_3mo / 3.0)
WITH params AS (
  SELECT DATE '2025-12-31' AS q_end,
         DATE '2025-10-01' AS trailing_start
),
finished_vacant AS (
  SELECT ul.cbsa_code, COUNT(*) AS fv
  FROM fact_unit_lifecycle ul, params p
  WHERE ul.completion_date <= p.q_end
    AND (ul.closing_date IS NULL OR ul.closing_date > p.q_end)
    AND COALESCE(ul.is_model_home, FALSE) = FALSE
  GROUP BY ul.cbsa_code
),
trailing_closings AS (
  SELECT ul.cbsa_code, COUNT(*) AS closings_3mo
  FROM fact_unit_lifecycle ul, params p
  WHERE ul.closing_date BETWEEN p.trailing_start AND p.q_end
  GROUP BY ul.cbsa_code
)
SELECT
  fv.cbsa_code,
  fv.fv AS finished_vacant_inventory,
  tc.closings_3mo,
  CASE
    WHEN tc.closings_3mo = 0 THEN NULL
    ELSE (fv.fv::NUMERIC / (tc.closings_3mo::NUMERIC / 3.0))
  END AS months_supply_finished_vacant
FROM finished_vacant fv
LEFT JOIN trailing_closings tc USING (cbsa_code)
ORDER BY fv.cbsa_code;

Closing Synthesis

A Houston-style national product is achievable if you explicitly separate:

The biggest strategic decision is whether your national MVP should:

Either way, the sources and connector patterns above provide a realistic, automatable path grounded in primary federal and government-adjacent systems.


← Back to research index