BuildCalc API

Crawler & DMCA policy

How the BuildCalcAPI-Crawler operates, robots.txt rules, citation requirements, and DMCA contact

The BuildCalcAPI-Crawler is the automated agent that populates the products vertical (/v1/products/*) by fetching spec data from public federal certification directories and manufacturer-published catalog PDFs. This page documents the crawler's operational rules, identification, rate behavior, and the DMCA takedown channel for rightsholders.

Identification

FieldValue
User-AgentBuildCalcAPI-Crawler/1.0 (+https://buildcalcapi.dev/crawler)
Operator contact[email protected]
DocumentationThis page (/crawler)

If you see requests with this UA in your logs and want to verify they originate from BuildCalc API, the source IP will be a Cloudflare or Render egress range. The UA is the canonical identifier.

Rate behavior

The crawler self-throttles to a maximum of 1 request per second per host. Throttle is enforced in code (app.etl._products_common._host_throttle), not as a best-effort target. The intent is to keep load below the threshold where a small public directory would notice us at all.

robots.txt

robots.txt is honored unconditionally. The fetcher (Python's standard urllib.robotparser) is invoked once per host on first contact and the result cached for the crawler lifetime. A Disallow rule that matches our path raises an internal RobotsDisallowed exception and the fetch never happens.

To block the crawler from a specific path, add to your robots.txt:

User-agent: BuildCalcAPI-Crawler
Disallow: /path/to/block

To block it entirely:

User-agent: BuildCalcAPI-Crawler
Disallow: /

What we extract — and what we don't

The crawler extracts factual spec values — SEER2 numbers, U-factor numbers, model numbers, certification IDs — and stores them in a JSONB column keyed by canonical spec name. We do not store:

  • Descriptive marketing prose from product catalogs
  • Photographs, renders, or other image bytes (we store only a URL into the original source CDN in products.image_url)
  • Full table layouts as compiled works
  • Pricing data (out of scope for v1)
  • Reseller or distributor information

Every product row carries a source_url and (where applicable) a certification_number so downstream consumers can verify against the authority of record.

Source classification

The crawler operates only against sources classified T0-T2 plus one T3 (ICC-ES ESR, deferred until LLC formation). Sources behind authentication walls or with anti-automation infrastructure (UL Prospector, UL Certifications, AHRI's paid subscription API) are flagged scrapeable: false in our internal registry and rejected at the framework level — no per-source code can fetch them.

Active sources (as of 2026-05-28)

SourceCategoryTierEndpoint
ENERGY STAR (Heat Pumps + Geothermal + Boilers + Furnaces)hvacT1data.energystar.gov/api/views/*/rows.csv
ENERGY STAR Storm WindowswindowsT1data.energystar.gov/api/views/qaxz-ikcb/rows.csv
ENERGY STAR InsulationinsulationT1data.energystar.gov/api/views/kphf-22jd/rows.csv
ENERGY STAR Ceiling + Vent FanselectricalT1data.energystar.gov/api/views/{2te3-nmxp,8dv7-nngq}/rows.csv
EPA WaterSenseplumbingT1api.epa.gov/watersense/products/{type}/?offset=N

AHRI Reference Numbers are recorded as product_certifications.cert_type='ahri' rows when present in the ENERGY STAR HVAC CSV — this gives agents a cross-link into the AHRI Directory without us needing the paid AHRI Data Subscription Program license.

See ADR-0015 for the full per-source tier table and the three-prong legal framework (Feist + ToS + CFAA) that grounds the crawler's posture.

DMCA takedown

If you believe a specific product row infringes your copyright (e.g., we extracted protected expression rather than fact), send a §512(c)(3)-compliant notice to:

[email protected]

Acknowledgement SLA: 24 hours. Removal SLA: 72 hours from a valid, complete notice. We follow §512(g) counter-notice process and maintain a repeat-infringer policy per ADR-0015.

To expedite review, include in your notice:

  • The specific product_id (visible in /v1/products/{id} responses) or the source_url of the row in question
  • A description of the protected work
  • A signed statement of good-faith belief that the material is infringing and not authorized by you, your agent, or the law
  • Your contact information

We respond to all complete notices regardless of the requesting party's size or jurisdiction.

Reporting other concerns

For non-copyright concerns (a fact that's incorrect, a model number that no longer exists, a source URL that 404s), email [email protected] — these route to a different inbox and a faster, non-legal review.

On this page