Benchmarks methodology

How BuildCalc derives typical duration + cost ranges for 15 standardized US residential construction project types from federal data. Full derivation framework, 4-level confidence taxonomy, federal-source matrix, BuildCalc-curated scope norms, methodology versioning, legal posture, and 7 known limitations.

Overview

The benchmarks vertical surfaces typical duration (duration_days_p25/p50/p75) and cost ranges (cost_usd_p25/p50/p75) for 15 standardized US residential construction project types across 5 geography tiers (national + msa_large + msa_medium + msa_small + rural).

All numerical values are derived from federal sources only. We do NOT scrape NAHB Cost vs Value tables, RSMeans Light Construction, Dodge Data, Angi cost guides, or any other paywalled / copyrighted industry benchmarks. We DO reference their methodology concepts (e.g. "midrange kitchen remodel" as a recognizable scope tier) from publicly-available descriptions only, and apply that scope to federal data to derive our own numbers.

See ADR-0016 (in repo at docs/adr/0016-benchmarks-vertical-legal-posture.md) for the full three-prong legal framework (Feist + ToS + CFAA) + per-source legal tier classification.

Sources matrix

7 federal sources + 1 internal scope-norms identifier feed Phase 8:

Source	Tier	Refresh	Used for
Census C30	T0	Monthly + annual rollups	Per-permit SFH valuations (new construction)
Census BPS	T0	Monthly	Alteration + addition valuations (15 project types)
Census BCC Price Index	T0	Quarterly	Base-year cost escalation for all rows
BLS OEWS	T0	Annual (April release)	Per-trade hourly wages (via Phase 6 `cost_labor_oews` table — no re-fetch)
BLS Productivity	T0	Annual	Labor-share derivation for project types where applicable
EIA RECS	T0	Every 5 years	HVAC install cost regional data (hvac_replacement only)
HUD SOC (via Census)	T0	Annual	Start-to-completion duration (new_construction_sfh). HUD SOC is co-published on `census.gov/construction/soc/` — the URL is Census even though the survey is HUD's.
`industry_norm` (internal)	n/a	n/a	Source-type identifier for the BuildCalc-authored `scope_norms.json` entries (BuildCalc's own creative-work classification of typical scopes). NOT a federal source; cites methodology concepts only, not numerical values from third parties. See "Legal posture" below + §3.7-3.8 of the spec.

Federal sources (Census C30, BPS, BCC, BLS OEWS, BLS Productivity, EIA RECS, HUD SOC via Census) are Tier T0 (federal public-domain) per ADR-0016 classification — no scraping, no auth bypass, no ToS conflicts.

industry_norm is BuildCalc's internal source label for citations pointing at our own scope_norms.json (not a third-party source).

Confidence taxonomy

Every benchmark row carries a confidence field with one of 4 values:

Level	When applied
`measured`	Direct federal data row (e.g. Census C30 per-MSA average)
`derived`	Computed from federal data via documented formula
`derived_industry_method`	Applies BuildCalc-curated hour-per-task norms + scope definitions; numerical values from federal data
`interpolated`	Regional interpolation from adjacent MSAs or geography_tier fallback (e.g. rural row interpolated from `msa_small`)

Every row also has a confidence_reason text field explaining which sources fed it and any interpolation applied.

Why this differs from Phase 6 costs taxonomy

Phase 6 costs taxonomy (measured / computed / computed_with_fallback / computed_low_coverage) describes geographic scaling provenance — does this MSA have direct OEWS rows, or did we fall back to county QCEW × parent NAICS scaling?

Phase 8 benchmarks taxonomy describes derivation type — was this row direct federal data, or was it computed via a documented formula? Different domain, same spirit (4 levels + honest about confidence + _reason field).

ADR-0016 documents this intentional divergence.

Geography tiers

Tier	Population threshold	`area_code`	Notes
`national`	US 50 states + DC	`US00000`	Direct federal national average
`msa_large`	> 1M pop	(MSA code)	BuildCalc-defined; ~30 MSAs
`msa_medium`	250k - 1M pop	(MSA code)	BuildCalc-defined; ~30 MSAs
`msa_small`	< 250k pop	(MSA code)	Default for unlisted MSAs; ~330 MSAs
`rural`	Non-MSA counties	`RURAL00`	Interpolated from `msa_small`

Tier thresholds are BuildCalc-defined, NOT Census-endorsed. The Census Bureau officially classifies Metropolitan (≥50k) vs Micropolitan (10k-50k) only — there is no published threshold for "small/medium/large" MSAs. Our tier mapping lives at app/benchmarks/seed_data/msa_tiers.json and the disclaimer is written into every msa-tier row's confidence_reason.

For rural rows we apply:

cost_usd × 0.85 (rural labor wage discount, typical national-vs-rural spread per BLS OEWS)
duration × 1.10 (+10% extension for rural logistics)

This makes rural a confidence: "interpolated" row, not a measurement.

Derivation formulas

New construction SFH (sub-del 2)

Census C30 per-permit SFH valuation (latest year)
  → escalate to cost_usd_base_year via Census BCC Price Index
  → split into p25/p50/p75 with ±25% IQR
  → MSA tiers: × 1.20 (large) / × 1.00 (medium) / × 0.85 (small)
  → rural: msa_small × 0.85

Duration: HUD Survey of Construction annual avg
  → p25/p50/p75 from HUD percentiles
  → rural: × 1.10

Remodels + replacements + additions (sub-dels 3 + 4a + 4b + 5)

Census BPS alterations (or EIA RECS for HVAC)
  → scope_norms.json materials_share × BPS avg
  + BLS OEWS labor wages (from cost_labor_oews table)
    × scope_norms.json expected_total_labor_hours
  → BuildCalc-curated labor_pct + materials_pct split
  → tier multipliers (same as new construction)
  → rural: × 0.85

All 14 remodel/replacement/addition project types share the _compute_remodel_or_replacement() orchestrator in app/etl/benchmarks.py with per-project labor_pct / materials_pct splits.

Methodology versioning

Each annual cron refresh stamps every benchmark row with a methodology_version value (e.g. "2026-07-15-v1"). A snapshot file at app/benchmarks/seed_data/methodology_history/<version>.json preserves the exact derivation formula used to compute that row.

This enables audit reproducibility — external auditors can verify any DB row against the snapshot of the methodology that produced it, even years after we refine the formula.

If the v2 formula changes (e.g., adds a 5-percentage-point methodology buffer), old benchmark rows keep their old methodology_version value and reflect the original derivation. New rows after the refresh stamp the new version.

Sub-deliverable scope (v1)

Phase 8 ships in 6 sub-deliverables, fully wired in v0.3.0:

Sub-del	Project types	Status
1	Infrastructure (schema + 4 endpoints + seeds + ETL framework)	✅
2	new_construction_sfh	✅
3	kitchen + bath remodels (4 types)	✅
4a	roof / siding / window / exterior_painting (4 types)	✅
4b	flooring / HVAC / electrical_panel (3 types)	✅
5	deck / garage / basement_finish (3 types)	✅
6	v0.3.0 release (version bump + tag)	✅

Live federal-data fetcher implementations land at first deploy of the annual cron.

Known limitations

Like Phase 6's costs methodology page, we enumerate the gotchas honestly.

1. MSA tier boundaries are BuildCalc-defined

The thresholds >1M / 250k-1M / <250k are not Census-published. We chose them to balance Internal-revenue-impact tier spread vs MSA count per tier. Different thresholds (e.g., top-25 vs top-50 vs top-100 MSAs) would yield meaningfully different msa_large benchmarks. Every msa-tier row carries this disclaimer in confidence_reason.

2. Rural rows are interpolated, not measured

Federal data does not consistently report at the non-MSA county level for project-level benchmarks (Census C30/BPS report MSA + national; HUD SOC reports national + region). Our rural rows are computed as msa_small × 0.85 labor discount + +10% duration extension. This is an industry-standard heuristic but it IS interpolation, not measurement.

3. Scope norms are BuildCalc-authored

We do NOT reproduce NAHB Cost vs Value or RSMeans Light Construction scope tables. Our scope_norms.json is BuildCalc's creative work drawing on general methodology concepts (e.g. "150-250 sqft kitchen, semi-stock cabinets"). Numerical values are federal-data-derived. AI agents using these scopes should validate against the source field scope_authored_by and the legal-note in scope_norms.json.

4. EIA RECS HVAC data has 5-year refresh lag

EIA RECS publishes every ~5 years. For hvac_replacement, our cost data may be up to 5 years older than the current year, even after BCC escalation. We flag this in the row's confidence_reason.

5. BLS Productivity reports lag annual benchmarks

BLS Productivity NAICS-238 (construction) is published annually but several months after year-end. For the labor-share split on remodels, we use the most recent available year's productivity ratio.

6. Period vs cost_usd_base_year may diverge

period is the year the benchmark applies to (e.g. '2026'). cost_usd_base_year is the dollar year for the cost columns (also typically 2026). When the source data is from a prior year (e.g. Census C30 2024 release), the sources row's source_period reflects the data year — and the cost is BCC-escalated from source_period to cost_usd_base_year.

7. Methodology versioning prevents drift but doesn't prevent error

If the v1 methodology has a subtle bug (wrong tier multiplier, wrong labor split), v1 rows ALL reflect the bug. We surface methodology_version so callers can identify which derivation is in their result, but the version stamp doesn't itself catch logic errors. External audit subagents validate against canonical sources to catch these.

Legal posture summary

(Full detail in ADR-0016.)

Feist (copyright): facts not copyrightable; numerical benchmarks are facts. Specific scope descriptors may be creative selections, so we author our own scope_norms.
ToS (contract): NAHB Cost vs Value is paywalled (Zonda); we DO NOT access. RSMeans, Dodge: same. We reference methodology concepts from publicly-available descriptions only.
CFAA: zero authentication bypass. All federal sources are public CSVs / Excel / APIs.

Per-source legal tiers

The full T0-T4 framework lives in ADR-0016 (predecessor framework established for Phase 7 products). For Phase 8, all sources we DO access are Tier T0 (federal public-domain). The T3 + T4 rows below are listed as negative examples — sources we explicitly DO NOT access; they show up only to document the boundary.

Tier	Definition	Phase 8 inclusion / exclusion
T0	Federal public-domain (USG works under §105)	Included: Census C30/BPS/BCC, BLS OEWS/Productivity, EIA RECS, HUD SOC
T1	Federal-recognized cert registry (research-permitted)	None in Phase 8 (Phase 7 uses these for AHRI/NFRC)
T2	Mfr-published catalog PDF	None in Phase 8 (Phase 7 uses these for mfr specs)
T3	Private paywalled report	Excluded — DO NOT access: NAHB Cost vs Value, RSMeans Light, Dodge
T4	Publicly-published industry guides with copyright	Excluded — DO NOT republish tables: ICC Cost-vs-Quality, AGC labor guides (methodology concepts only)

DMCA process

DMCA framework inherits from Phase 7 ADR-0015:

USCO DMCA agent registration (gated on LLC formation)
[email protected] route (gated on LLC formation)
24h ack SLA
§512(c)(3) takedown procedure
Repeat-infringer policy
Annual audit

Phase 8 inherits the framework — no benchmarks-specific DMCA process needed.

Refresh cadence

bcapi-etl-benchmarks (annual, July 15) — runs after:

BLS OEWS April release
Census C30 May release
BLS Productivity annual release

Better Stack heartbeat fires only on cron success. Period 365 days, grace 7 days.

Sources

Census C30 New Residential Construction: https://www.census.gov/construction/nrc/
Census BPS Building Permits Survey: https://www.census.gov/construction/bps/
Census BCC Price Indexes for New One-Family Houses Under Construction: https://www.census.gov/construction/cpi/
BLS OEWS Occupational Employment and Wage Statistics: https://www.bls.gov/oes/
BLS Productivity Industry Productivity: https://www.bls.gov/productivity/
EIA RECS Residential Energy Consumption Survey: https://www.eia.gov/consumption/residential/
HUD SOC Survey of Construction: https://www.census.gov/construction/soc/

On this page