Build GitHub activity extract and snapshot pipeline
Problem
Build the GitHub GraphQL and REST extraction flow that produces versioned CSV or Parquet snapshots for static Dataface dashboard builds, including long-term traffic history capture.
Context
- The initiative spec recommends a hybrid data source:
- GitHub GraphQL for contributor-centric history
- GitHub REST for repo traffic and selected repo-level endpoints
- GitHub traffic endpoints only retain 14 days of views / clones / referrers / paths, so long-term charts require scheduled snapshot persistence.
- The dashboard pack is intended for GitHub Pages static hosting, so the pipeline must materialize all data ahead of build time.
dft build --mode html --bake-dataalready gives Dataface a static-output path; this task needs to produce the input data contract for that build.
Possible Solutions
-
Pull only current-state API data at build time - Simplest extraction logic. - Loses long-term traffic and trend history; not enough for the intended dashboard story.
-
Recommended: hybrid API extraction with git-backed history snapshots - Fetch GitHub data on a schedule, normalize into CSV or Parquet, and merge into durable history tables. - Preserves traffic beyond the 14-day API window. - Fits static hosting and avoids introducing a separate database.
-
Use an external warehouse or analytics store - Richest long-term analytics option. - Violates the "easy add-on" goal and makes OSS adoption much harder.
Plan
- Define the normalized tables and their grains for repos, contributors, PRs, issues, reviews, and traffic history.
- Implement the extraction flow with bounded incremental sync windows.
- Persist snapshots to a machine-managed history location that Actions can read and update.
- Emit stable CSV or Parquet outputs for the dashboard pack.
- Validate that the resulting data contract supports all v1 boards without hidden joins at render time.
Implementation Progress
QA Exploration
- [ ] QA exploration completed (or N/A for non-UI tasks)
N/A for browser QA. Verification should happen through extraction tests and static build checks.
Review Feedback
- [ ] Review cleared