Spec
Product summary
Build a GitHub OSS analytics add-on for Dataface that lets an open-source project publish a static dashboard site with:
- project-level activity and health views
- contributor drill-through dashboards
- scheduled refresh via GitHub Actions
- hosting on GitHub Pages
The v1 product should be simple enough that a maintainer can add it to an OSS repo with minimal custom code and no persistent backend.
Goals
Primary goals
- Make GitHub activity more visible than native Insights for OSS maintainers and contributors.
- Give projects a polished, public dashboard site rather than a single stats card.
- Support cross-board links and a canonical per-contributor dashboard.
- Use only static hosting and scheduled GitHub automation.
- Keep the setup path easy enough for community projects to adopt.
Non-goals
- Hosted multi-tenant analytics service
- private-repo analytics in v1
- full enterprise developer productivity platform
- exact DORA semantics without trustworthy deployment signals
- bespoke per-project modeling before the default pack is proven
Target users
Maintainer
Wants:
- a public status site for contributor and project activity
- visibility into review load, backlog, and contribution trends
- setup that does not require running infrastructure
Contributor
Wants:
- a visible profile of their contributions to the project
- easy paths from a summary board to their own dashboard
Wants:
- evidence that the project is active
- visibility into where help is needed
- a quick read on who is active and how the project works
Functional scope
V1 board set
The recommended v1 site contains five boards:
- Project overview
- Contribution flow
- Review and collaboration
- Contributor directory
- Contributor detail
Required behavior
- Every board can be built to static HTML with baked data.
- Every board shares a common project header and navigation rail.
- Every board that mentions a person can link to the contributor detail board.
- The contributor detail board accepts a stable contributor key:
contributor=<github_login>
Nice-to-have if time allows
- repository detail board
- reach/adoption board using traffic and referrer history
Dashboard topology
1. Project overview
Questions answered:
- Is the project active?
- Is work moving?
- Are contributors concentrated or broadening?
Required modules:
- activity trend
- active contributors
- open PR / issue backlog
- recent release cadence
- top contributors table
Required links:
- contributor names -> contributor detail
- overview nav -> flow, review, contributors
2. Contribution flow
Questions answered:
- How quickly are PRs and issues moving?
- Where is work piling up?
Required modules:
- PR opened vs merged trend
- issue opened vs closed trend
- time to first review
- time open to merge
- stale PR / issue tables
- PR size distribution
Required links:
- author / reviewer names -> contributor detail
3. Review and collaboration
Questions answered:
- Who is reviewing?
- Who is overloaded?
- Where are collaboration bottlenecks?
Required modules:
- reviews submitted by contributor
- review requests received vs completed
- comment intensity
- reviewer / author matrix or ranked table
Required links:
- person cells -> contributor detail
4. Contributor directory
Questions answered:
- Who is active right now?
- Who contributes in what ways?
Required modules:
- contributor leaderboard table
- contribution type mix
- activity status categories
Required links:
- contributor name -> contributor detail
5. Contributor detail
Questions answered:
- What has this person contributed recently?
- Which repos or work areas are they active in?
- Do they mostly author PRs, review, comment, or open issues?
Required modules:
- summary KPIs
- contribution calendar or trend
- authored PR list
- review activity list
- issue activity list
- repository mix
Required links:
- back to overview
- back to directory
Link model
Use the existing Dataface dashboard-linking direction:
- human-readable board paths
- query parameters for board variables
- contributor drill-through based on
contributor=<github_login>
Recommended board path convention:
/github-oss/overview
/github-oss/flow
/github-oss/reviews
/github-oss/contributors
/github-oss/contributor-detail
When a board needs filtered return links, pass:
repo=<repo_name> when applicable
contributor=<github_login> for person drill-through
window=<time_window> for date-range consistency if supported
Data contract
Core entities
| Entity |
Grain |
Purpose |
repositories |
one row per repo |
repo metadata and health rollups |
contributors |
one row per GitHub login |
stable contributor dimension |
contribution_daily |
contributor x day |
trend and activity calendar |
pull_requests |
one row per PR |
flow, size, timing, authorship |
pull_request_reviews |
one row per review event |
review throughput and lag |
issues |
one row per issue |
issue flow and aging |
repo_traffic_daily |
repo x day |
views and clones history |
repo_referrers_daily |
repo x referrer x snapshot_day |
traffic sources |
repo_paths_daily |
repo x path x snapshot_day |
popular content |
Required dimensions
repo_owner
repo_name
github_login
is_bot
date
created_at
closed_at
merged_at
state
Required derived metrics
pr_cycle_hours
hours_to_first_review
pr_size
issue_close_hours
is_stale
active_days_last_28
repositories_contributed_to_count
Identity rule
Use github_login as the canonical contributor key for v1.
Data acquisition strategy
Recommended source mix
- GraphQL for contributor-centric contribution data and per-user contribution history
- REST for repo traffic, referrers, paths, releases, and convenient repo-level endpoints
- Load repo config.
- Resolve repo set and contributor allow/exclude rules.
- Fetch incremental raw data from GitHub APIs.
- Normalize into CSV or Parquet snapshots.
- Merge into history tables.
- Build dashboards from those materialized tables.
History persistence
Traffic metrics and trend continuity require persistence beyond the current run.
Recommended v1 approach:
- maintain a machine-managed history branch such as
dataface-history
- store raw or normalized snapshots there
- keep the Pages deployment artifact separate from the history store
Why this is recommended:
- no external database
- durable git-backed history
- easy debugging
- works with public OSS repos
Packaging and repository layout
Recommended v1 shape
Ship the initiative as:
- a starter pack in the Dataface repo
- a starter workflow or reusable workflow example
- a minimal config file for selecting repos and filters
- one real example deployment for proof
Suggested layout in an adopting repo
.github/workflows/dataface-github-oss.yml
.github/dataface-github-oss.yml
faces/github_oss/
overview.yml
flow.yml
reviews.yml
contributors.yml
contributor_detail.yml
dataface.yml
Config fields
Suggested config surface:
owner: org or user login
repos.include
repos.exclude
contributors.exclude_bots
contributors.exclude
refresh.schedule
traffic.enabled
history_branch
site.base_path
GitHub Actions and Pages flow
Build workflow
Recommended workflow stages:
checkout
- setup Python / uv
- setup Dataface / project dependencies
- fetch previous history branch
- run GitHub extraction job
- materialize normalized data files
- run
dft build --mode html --bake-data
- upload Pages artifact
- deploy via
actions/deploy-pages
Required GitHub Pages deployment shape
Use GitHub's current custom workflow model:
actions/configure-pages
actions/upload-pages-artifact
actions/deploy-pages
Required permissions include:
pages: write
id-token: write
Refresh cadence
Default cadence:
- daily scheduled run
- manual dispatch
- run on changes to dashboard or workflow files
Static build expectations
The pack assumes Dataface's existing static build path:
dft build ... --mode html
dft build ... --mode html --bake-data
This initiative should avoid introducing a special runtime-only mode. The deliverable is a true static site that can be served by GitHub Pages alone.
Security and permissions
Token model
V1 should use a repository secret or the default Actions token where possible, but likely needs a token with enough GitHub API scope for:
- GraphQL queries
- traffic endpoints when available
Public-safe posture
Because Pages is public, the pack should only publish metrics safe for public viewing by default:
- counts
- timings
- traffic summaries
- contributor names already visible in the public project
Do not publish sensitive internal review metadata or private-repo data in v1.
Risks and mitigations
| Risk |
Impact |
Mitigation |
| API rate limits |
incomplete refreshes |
incremental sync, bounded lookback, caching snapshots |
| Traffic-only 14 day API window |
broken long-term charts |
persist daily snapshots in history branch |
| Contributor identity noise |
misleading people metrics |
bot exclusion, explicit allow / deny lists |
| Overly ambitious metric scope |
initiative stalls |
keep v1 to five boards and GitHub-native metrics |
| Linking immaturity |
weak drill-through UX |
depend on dashboard-linking-v1 and keep URL contract simple |
Acceptance criteria
The initiative is successful when:
- A maintainer can adopt the pack with a documented workflow and config file.
- GitHub Actions can refresh the site without manual data wrangling.
- GitHub Pages hosts a static multi-board dashboard site.
- Contributor names on summary boards click through to a contributor detail board.
- At least one real OSS repo or organization is used as a proof deployment.
- The docs clearly explain what metrics are trustworthy in v1 and what is intentionally out of scope.
Suggested execution order
- Finalize board topology and metric contract.
- Build extraction and history snapshot pipeline.
- Package the GitHub Actions and Pages flow.
- Implement the dashboard pack and publish a reference deployment.