tasks/workstreams/dashboard-factory/initiatives/github-oss-activity-dashboards/spec.md

Spec

Product summary

Build a GitHub OSS analytics add-on for Dataface that lets an open-source project publish a static dashboard site with:

The v1 product should be simple enough that a maintainer can add it to an OSS repo with minimal custom code and no persistent backend.


Goals

Primary goals

Non-goals


Target users

Maintainer

Wants:

Contributor

Wants:

Community / prospective contributor

Wants:


Functional scope

V1 board set

The recommended v1 site contains five boards:

  1. Project overview
  2. Contribution flow
  3. Review and collaboration
  4. Contributor directory
  5. Contributor detail

Required behavior

Nice-to-have if time allows


Dashboard topology

1. Project overview

Questions answered:

Required modules:

Required links:

2. Contribution flow

Questions answered:

Required modules:

Required links:

3. Review and collaboration

Questions answered:

Required modules:

Required links:

4. Contributor directory

Questions answered:

Required modules:

Required links:

5. Contributor detail

Questions answered:

Required modules:

Required links:


Use the existing Dataface dashboard-linking direction:

Recommended board path convention:

When a board needs filtered return links, pass:


Data contract

Core entities

Entity Grain Purpose
repositories one row per repo repo metadata and health rollups
contributors one row per GitHub login stable contributor dimension
contribution_daily contributor x day trend and activity calendar
pull_requests one row per PR flow, size, timing, authorship
pull_request_reviews one row per review event review throughput and lag
issues one row per issue issue flow and aging
repo_traffic_daily repo x day views and clones history
repo_referrers_daily repo x referrer x snapshot_day traffic sources
repo_paths_daily repo x path x snapshot_day popular content

Required dimensions

Required derived metrics

Identity rule

Use github_login as the canonical contributor key for v1.


Data acquisition strategy

  1. Load repo config.
  2. Resolve repo set and contributor allow/exclude rules.
  3. Fetch incremental raw data from GitHub APIs.
  4. Normalize into CSV or Parquet snapshots.
  5. Merge into history tables.
  6. Build dashboards from those materialized tables.

History persistence

Traffic metrics and trend continuity require persistence beyond the current run.

Recommended v1 approach:

Why this is recommended:


Packaging and repository layout

Ship the initiative as:

Suggested layout in an adopting repo

.github/workflows/dataface-github-oss.yml
.github/dataface-github-oss.yml
faces/github_oss/
  overview.yml
  flow.yml
  reviews.yml
  contributors.yml
  contributor_detail.yml
dataface.yml

Config fields

Suggested config surface:


GitHub Actions and Pages flow

Build workflow

Recommended workflow stages:

  1. checkout
  2. setup Python / uv
  3. setup Dataface / project dependencies
  4. fetch previous history branch
  5. run GitHub extraction job
  6. materialize normalized data files
  7. run dft build --mode html --bake-data
  8. upload Pages artifact
  9. deploy via actions/deploy-pages

Required GitHub Pages deployment shape

Use GitHub's current custom workflow model:

Required permissions include:

Refresh cadence

Default cadence:


Static build expectations

The pack assumes Dataface's existing static build path:

This initiative should avoid introducing a special runtime-only mode. The deliverable is a true static site that can be served by GitHub Pages alone.


Security and permissions

Token model

V1 should use a repository secret or the default Actions token where possible, but likely needs a token with enough GitHub API scope for:

Public-safe posture

Because Pages is public, the pack should only publish metrics safe for public viewing by default:

Do not publish sensitive internal review metadata or private-repo data in v1.


Risks and mitigations

Risk Impact Mitigation
API rate limits incomplete refreshes incremental sync, bounded lookback, caching snapshots
Traffic-only 14 day API window broken long-term charts persist daily snapshots in history branch
Contributor identity noise misleading people metrics bot exclusion, explicit allow / deny lists
Overly ambitious metric scope initiative stalls keep v1 to five boards and GitHub-native metrics
Linking immaturity weak drill-through UX depend on dashboard-linking-v1 and keep URL contract simple

Acceptance criteria

The initiative is successful when:

  1. A maintainer can adopt the pack with a documented workflow and config file.
  2. GitHub Actions can refresh the site without manual data wrangling.
  3. GitHub Pages hosts a static multi-board dashboard site.
  4. Contributor names on summary boards click through to a contributor detail board.
  5. At least one real OSS repo or organization is used as a proof deployment.
  6. The docs clearly explain what metrics are trustworthy in v1 and what is intentionally out of scope.

Suggested execution order

  1. Finalize board topology and metric contract.
  2. Build extraction and history snapshot pipeline.
  3. Package the GitHub Actions and Pages flow.
  4. Implement the dashboard pack and publish a reference deployment.