Export DEF CON conference data from Firestore, normalize it into deterministic JSON artifacts, and ship a client-friendly dataset optimized for IndexedDB-first consumption. This pipeline exists to make frontend clients fast and reliable by precomputing joins, indexes, and UI-ready views on the server side.
The tradeoff is deliberate: we spend more time during export to avoid runtime joins, scans, and costly data shaping on the client. That keeps network payloads small and predictable, and it makes client behavior deterministic across platforms.
High-level flow:
Firestore -> derived entities/indexes/views -> client IndexedDB (raw JSON is optional for debugging/auditing)
Explicit goals:
- correctness and referential integrity
- minimal network payloads
- offloading expensive work from the client
- deterministic builds with stable ordering and stable JSON key sorting
- Canonical entities vs views
- Entities are the authoritative, normalized records.
- Views are pre-shaped, UI-ready projections that avoid runtime joins.
{ allIds, byId }maps for O(1) access- Clients can fetch any record in constant time without scanning arrays.
- Precomputed indexes
- Common queries (day, location, tag, person) are computed once in the exporter.
- Timezone correctness
- All schedule bucketing uses
htConf.timezoneso day/minute keys match local event time.
- All schedule bucketing uses
- Minimal payload philosophy
- Prefer IDs over embedded objects; include strings only when they are directly needed by UI.
- Logic lives in the exporter
- The client should not have to reason about Firestore shapes or perform joins/scans.
out/ht/<conference>/
manifest.json
entities/
indexes/
views/
raw/ (only when --emit-raw is enabled)
Production artifacts live at the conference root (manifest.json, entities/, indexes/, views/); raw/ is optional debug/audit output behind --emit-raw. Production artifacts are normalized for size; raw (when emitted) is unmodified.
raw/(optional)- Firestore snapshots written as JSON for debugging and auditing.
- Emitted only when
--emit-rawis used. - Includes
conference.jsonplus all source collections:articles,content,documents,events,locations,menus,organizations,speakers,tagtypes.
manifest.json- Build manifest used for cache invalidation (see Manifest & Versioning).
entities/- Canonical, normalized stores intended for IndexedDB.
indexes/- Precomputed query accelerators keyed by day/minute/location/person/tag.
views/- UI-specific, render-ready models (minimal schedule view models, etc.).
Every entity file follows the same contract:
{
"allIds": [123, 456],
"byId": {
"123": { "id": 123, "...": "..." },
"456": { "id": 456, "...": "..." }
}
}Why it is used:
- Deterministic ordering via
allIds. - O(1) access via
byId. - Simple IndexedDB storage and lookup.
Canonical entities:
entities/events.json- The authoritative event model and the join target for schedule/details.
Views are intentionally minimal and UI-specific. Unless otherwise noted, they are arrays (not { allIds, byId } maps).
views/eventCardsById.json(schedule)- Shape:
{ [eventId]: EventCard }(byId map only) - EventCard fields:
id,content_id,begin,end,title,color,location,speakers,tags tags:{ id, label, color_background, color_foreground }[]
- Shape:
views/organizationsCards.json- Array of
{ id, name, logoUrl? }(logoUrl omitted when missing)
- Array of
views/peopleCards.json- Array of
{ id, name, affiliations } affiliationsis a string array of organization names
- Array of
views/tagTypesBrowse.json- Array of
{ id, label, category, sort_order, tags } tags:{ id, label, color_background, color_foreground, sort_order }[]- Only includes tag types where
is_browsable == true,category == "content", andtags.length > 0
- Array of
views/documentsList.json- Array of
{ id, title_text, updatedAtMs }(nobody_text)
- Array of
views/contentCards.json- Array of
{ id, title, tags } tags:{ id, label, color_background, color_foreground }[]
- Array of
Indexes are maps from a key to a list of event/content IDs. They are sorted by event start time (or ID where appropriate) for stable client rendering.
eventsByDay- Key:
YYYY-MM-DD(conference timezone) - Solves: day-based schedule grouping without scanning all events.
- Key:
eventsByStartMinute- Key:
YYYY-MM-DDTHH:MM(conference timezone) - Solves: minute-precision schedules and timelines.
- This is the minute-bucket equivalent of an epoch-minute index; add one only if a client actually needs numeric minute keys.
- Key:
eventsByLocation- Key:
location_id - Solves: room-based schedules and filters.
- Key:
eventsByPerson- Key:
person_id - Solves: speaker/person pages and filters.
- Key:
eventsByTag- Key:
tag_id - Solves: tag-based filters and tag landing pages.
- Key:
contentByTag(if present in content)- Key:
tag_id - Solves: tagged content filtering without scanning all content.
- Key:
The client should never recompute these indexes. They are part of the contract and kept stable by the exporter.
manifest.json contains:
code: conference codename: display nametimezone: IANA timezone stringbuildTimestamp: the build-time timestamp (UTC)
buildTimestamp is generated at export time and is intentionally timestamp-based (not a hash) to keep clients simple. If the timestamp changes, clients should invalidate and reload all derived artifacts. It is expected to change on every export run.
Expected client behavior:
- Fetch
manifest.jsonon page load. - Compare
buildTimestampto the version stored in IndexedDB. - If changed (or missing locally):
- download
entities/*,indexes/*,views/* - store each file as an IndexedDB object store
- download
- Use IndexedDB for all queries thereafter.
Typical queries:
- Schedule view
- Read
indexes/eventsByDay[YYYY-MM-DD], then dereference IDs fromentities/events.byIdand/orviews/eventCardsById.
- Read
- Event detail
- Lookup in
entities/events.byIdby event ID, join as needed withentities/locations,entities/people,entities/content.
- Lookup in
- Tag/person filters
- Use
indexes/eventsByTag[tagId]orindexes/eventsByPerson[personId]and resolve IDs from entities/views.
- Use
The exporter guarantees:
- All references are validated and missing refs are counted.
- Missing or invalid references are handled deterministically (filtered out during entity build).
- Stable ordering for entities and indexes.
- Timezone-correct day/minute keys using
htConf.timezone. - Warnings emitted for bad data (missing refs, invalid dates, invalid timezone).
- Warnings are informational and do not fail the export.
Prerequisites:
- Node.js >= 18
- Firebase config in
src/config.js
Run a local export:
npm install
npm run export -- DEFCON33CLI options:
npm run export -- --conf DEFCON33 --out ./out/ht --emit-rawOutputs are written to:
out/ht/<conference>/(conference code is lowercased)out/ht/<conference>/raw/only when raw emission is enabled
Raw outputs are optional now because the website consumes derived artifacts only. Enable raw snapshots when debugging or auditing:
npm run export -- --emit-raw DEFCON33The --emit-raw flag (or -r) writes:
out/ht/<conference>/raw/conference.jsonout/ht/<conference>/raw/{articles,content,documents,events,locations,menus,organizations,speakers,tagtypes}.json
Migration note: older builds wrote production artifacts under out/ht/<conference>/derived/.
Typical runtime characteristics:
- Fetches all collections in parallel.
- Deterministic writes via stable key ordering.
- Large datasets will primarily be bound by Firestore read time.
Add a new entity:
- Fetch the new collection in
src/conf.js. - Normalize it in
src/build/entities.js. - Emit a new
entities/<name>.jsonfile.
Add a new index:
- Implement the index in
src/build/indexes.js. - Keep keys minimal and values ID-only.
- Sort deterministically (by start time or ID).
Add a new view:
- Use
src/build/views.jsfor UI-ready models. - Prefer a view when the UI needs a trimmed, pre-joined shape.
- Extend an entity only when the data is truly canonical.
Payload discipline:
- Prefer IDs over embedded objects.
- Avoid denormalized copies of large text.
- Keep view models minimal and UI-specific.
Common mistakes to avoid:
- Over-denormalizing entities (bloats payloads and breaks cache invalidation).
- Recomputing indexes on the client (wastes CPU and risks divergence).
- Using local time for schedule bucketing (must use
htConf.timezone).
This pipeline does NOT try to provide:
- partial updates or incremental diffs
- fine-grained patching or merge logic
- client-side joins as a primary mechanism
- runtime schedule computation or grouping
- Run export:
npm run export -- --conf <CONF> - Publish artifacts from
out/ht/<conf>/