Products methodology
How BuildCalc API's products vertical sources data, normalizes specs, classifies confidence, and handles per-source legal posture across the 10 categories.
The products vertical is a federated factual catalog of construction
SKUs across 10 categories. This page documents the per-category source
authorities, the normalization choices baked into the JSONB spec keys,
the 4-level confidence taxonomy, and the legal posture per source
(ADR-0015).
Category coverage matrix
| Category | Source authority | Status | Spec key examples |
|---|---|---|---|
| hvac | ENERGY STAR Central AC + Heat Pumps + Geothermal + Boilers + Furnaces | live | specs.seer2, specs.eer2, specs.hspf2, specs.btu_h_cooling, specs.afue |
| windows | ENERGY STAR Storm Windows (NFRC residential pending) | live (subset) | specs.u_factor, specs.shgc, specs.vt, specs.air_leakage_cfm_sf |
| plumbing | EPA WaterSense (toilets, faucets, showerheads, urinals, flushometers, irrigation, sprinklers, RO) | live | specs.fixture_type, specs.gpf, specs.gpm, specs.ada |
| insulation | ENERGY STAR Certified Insulation | live | specs.r_value, specs.material, specs.thickness_in, specs.fire_class |
| electrical | ENERGY STAR Ceiling Fans + Ventilating Fans | live (subset) | specs.device_type, specs.watts_high, specs.efficiency_cfm_per_w |
| doors | NFRC CPD | coming soon (Phase 7.5) | |
| roofing | per-mfr PDFs (GAF, CertainTeed, OC, IKO, Tamko) | coming soon (Phase 7.5) | |
| drywall | per-mfr PDFs (USG, CertainTeed, NatGyp, Continental) | coming soon (Phase 7.5) | |
| lumber | per-mfr engineered-wood PDFs | coming soon (Phase 7.5) | |
| hardware | ICC-ES ESR + Simpson Strong-Tie | deferred (LLC + USCO DMCA designation; see ADR-0015) |
4-level confidence taxonomy
Every product row carries a confidence value indicating how much
trust the caller should put in the spec values:
| Value | Meaning | Used for |
|---|---|---|
certified | Issued by a federal/national authority (ENERGY STAR, AHRI, NFRC, WaterSense, ICC-ES) | All live verticals — every spec value cites the certification authority |
mfr_published | Extracted deterministically (Tier-A pdfplumber rules) from a mfr official PDF catalog | Reserved for sub-dels 7/9/10 when mfr-PDF parsers land |
ollama_extracted | Extracted via the local Qwen 2.5-VL vision fallback (Tier-B) when Tier-A rules couldn't recognize the PDF shape | Reserved — values should be cross-checked against a primary source |
legacy_or_stale | Source was authoritative when fetched but is no longer current (e.g., SEER1 ratings post-2023 SEER2 transition) | Marker for catalog reconciliation |
Live verticals today all land as certified. The agent SHOULD trust
these for spec selection but MUST always cite the source_url to the
end user (it links back to the authority of record).
Per-category source detail
HVAC
Three ENERGY STAR Socrata datasets feed the hvac category:
| Dataset | Subset | URL |
|---|---|---|
83eb-xbyy | Central AC + Air-Source Heat Pumps | data.energystar.gov/api/views/83eb-xbyy/rows.csv |
acvd-5wvz | Geothermal Heat Pumps | data.energystar.gov/api/views/acvd-5wvz/rows.csv |
6rww-hpns | Boilers | data.energystar.gov/api/views/6rww-hpns/rows.csv |
i97v-e8au | Furnaces | data.energystar.gov/api/views/i97v-e8au/rows.csv |
The cron streams each CSV to a tempfile and upserts on
UNIQUE (mfr, model_number, category='hvac', revision='current').
When the same outdoor unit pairs with multiple certified indoor coils,
the rated SEER2 can differ; our UNIQUE constraint collapses these into
one canonical product row + multiple product_certifications rows for
each AHRI cert pairing. Agents needing per-pairing efficiency look at
product_certifications.cert_type='ahri' and de-reference the AHRI
Reference Number on the AHRI Directory directly.
Plumbing (WaterSense)
EPA's public JSON API at api.epa.gov/watersense. The endpoints use a
mix of lowercase (/products/toilets/) and camelCase
(/products/IrrigationControllers/, /products/reverseOsmosisSystems/)
slugs — the cron paginates each in turn. WBIC (Weather-Based) +
SMS (Soil-Moisture) controllers share the IrrigationControllers
endpoint; the productType field per row disambiguates.
specs.fixture_type discriminates the 8 ingested types: toilet,
faucet, showerhead, urinal, flushometer_valve,
irrigation_controller, spray_sprinkler, reverse_osmosis.
Windows (Storm subset)
Only the Storm Windows subset (qaxz-ikcb) ships in v1. The
residential primary-window market (sliders, casements, double-hung)
lives at NFRC CPD; that ingest requires the ASP.NET WebForms
ViewState flow which is deferred to Phase 7.5.
Spec keys mirror the NFRC five-rating system (U-factor, SHGC, VT, AL, CR) plus the storm-window specifics (frame material, glazing layers, emissivity, solar transmittance).
Electrical (Fans subset)
Two ENERGY STAR Socrata datasets:
2te3-nmxpCeiling Fans8dv7-nngqVentilating Fans
Breaker/panel/switch coverage (Eaton, Square D, Siemens, Leviton) requires mfr PDF parsing and ships in Phase 7.5.
Insulation
ENERGY STAR Certified Insulation (kphf-22jd) — bag/batt R-value
ratings primarily. The dataset is thin (~36 SKUs currently); broader
insulation product coverage lives in mfr PDFs (OC, Knauf, JM,
Rockwool, CertainTeed) which is Phase 7.5 work.
JSONB filter envelope
Per-category spec keys are declared in
app/routes/v1/products/_filters.py.
The endpoint validates each filter[specs.<field>.<op>]=<value> against
the allowlist; unknown keys return HTTP 400 with invalid_filter_key.
Supported ops: gte, lte, gt, lt, eq. Numeric values are cast
to double inside the jsonb_path_exists predicate; booleans are
serialized as JSON literals.
Source attribution + crawler policy
Every row's source_url resolves to the authority's deep-link page or
dataset root. The BuildCalcAPI-Crawler/1.0 user-agent (/crawler
page) self-throttles to 1 req/sec per host and honors robots.txt
unconditionally. DMCA notices go to [email protected] with a 24h
ack + 72h removal SLA.
Known limitations
- HVAC same-model AHRI cert collapse. A single outdoor unit model with multiple indoor-coil pairings gets one product row; pairing-specific SEER2 ratings live on the cert rows, not the product spec.
- Storm windows only. Residential primary-window catalog (sliders, casements, etc.) is pending NFRC ingest.
- Plumbing IrrigationControllers consolidation. WBIC + SMS share
fixture_type='irrigation_controller'; agents needing the distinction look atspecs.product_type. - Electrical fans only. Breaker/panel/switch SKUs await mfr PDF parsers.
- Insulation thin. ENERGY STAR's dataset covers ~36 SKUs of fiberglass + rigid board; not exhaustive for all insulation chemistries.
- 1 category remains gated. Hardware (Simpson Strong-Tie + ICC-ES ESR drill-down) is unblocked by USCO DMCA designation (DMCA-1073500 ACTIVE 2026-05-28) but awaits LLC formation for ICC-ES vendor master agreement signing. Other 4 categories (doors, roofing, drywall, lumber) now have real SKUs shipping monthly via NFRC dynamic discovery and per-mfr PDF parsers (post-Wave-8 2026-05-28). Live total: 9 of 10 categories, 40,723 SKUs as of 2026-05-29.