Dataface Tasks

Build GitHub activity extract and snapshot pipeline

IDDASHBOARD_FACTORY-BUILD_GITHUB_ACTIVITY_EXTRACT_AND_SNAPSHOT_PIPELINE
Statusnot_started
Priorityp1
Milestonem4-v1-0-launch
Ownerdata-analysis-evangelist-ai-training
Initiativegithub-oss-activity-dashboards

Problem

Build the GitHub GraphQL and REST extraction flow that produces versioned CSV or Parquet snapshots for static Dataface dashboard builds, including long-term traffic history capture.

Context

  • The initiative spec recommends a hybrid data source:
  • GitHub GraphQL for contributor-centric history
  • GitHub REST for repo traffic and selected repo-level endpoints
  • GitHub traffic endpoints only retain 14 days of views / clones / referrers / paths, so long-term charts require scheduled snapshot persistence.
  • The dashboard pack is intended for GitHub Pages static hosting, so the pipeline must materialize all data ahead of build time.
  • dft build --mode html --bake-data already gives Dataface a static-output path; this task needs to produce the input data contract for that build.

Possible Solutions

  1. Pull only current-state API data at build time - Simplest extraction logic. - Loses long-term traffic and trend history; not enough for the intended dashboard story.

  2. Recommended: hybrid API extraction with git-backed history snapshots - Fetch GitHub data on a schedule, normalize into CSV or Parquet, and merge into durable history tables. - Preserves traffic beyond the 14-day API window. - Fits static hosting and avoids introducing a separate database.

  3. Use an external warehouse or analytics store - Richest long-term analytics option. - Violates the "easy add-on" goal and makes OSS adoption much harder.

Plan

  1. Define the normalized tables and their grains for repos, contributors, PRs, issues, reviews, and traffic history.
  2. Implement the extraction flow with bounded incremental sync windows.
  3. Persist snapshots to a machine-managed history location that Actions can read and update.
  4. Emit stable CSV or Parquet outputs for the dashboard pack.
  5. Validate that the resulting data contract supports all v1 boards without hidden joins at render time.

Implementation Progress

QA Exploration

  • [ ] QA exploration completed (or N/A for non-UI tasks)

N/A for browser QA. Verification should happen through extraction tests and static build checks.

Review Feedback

  • [ ] Review cleared