Research
Why this initiative exists
GitHub already exposes useful but fragmented activity views:
- Pulse summarizes open and merged PRs, issues, and top committers, but only as a narrow repo-level view with limited drill-down.
- Traffic exposes views, clones, referrers, and popular paths, but only for the last 14 days and only for users with write access.
- Insights / graphs remain mostly repo-local and are not shaped as a reusable, public-facing multi-dashboard site.
That creates a clear opportunity for Dataface:
- turn GitHub activity into a static, linkable dashboard site
- give maintainers a cross-repo and cross-contributor view
- preserve longer-term history than GitHub's native traffic window
- publish a polished artifact on GitHub Pages with no separate backend
The bar is not "more charts than GitHub." The bar is:
- better visibility
- better storytelling
- better navigation
- easy adoption for real OSS maintainers
Research questions
This research focused on four questions:
- What can GitHub already show natively, and where are the visibility gaps?
- Which GitHub APIs expose enough data to build static dashboards without a live backend?
- What do OSS analytics and developer productivity tools already do well?
- Which dashboard set would feel immediately useful for open-source projects, especially with contributor drill-through?
What GitHub already shows
Native surfaces
| Surface |
What it shows |
Gap for this initiative |
| Pulse |
Open/merged PRs, open/closed issues, top 15 committers for a period |
No durable site, limited contributor detail, weak multi-board storytelling |
| Traffic |
Views, clones, referrers, popular paths |
Last 14 days only; no long-term trend unless someone snapshots it |
| Contributors / graphs |
Commit history and repo activity views |
Mostly repo-local; limited cross-repo or role-based narrative |
| Community profile |
Docs / governance checklist |
Good health signal, but not activity or contribution flow analytics |
Key takeaway
GitHub's native features are good operational hints, but weak as a public analytics product for a project community. Dataface can win by packaging the same underlying activity into:
- more durable history
- clearer metric families
- consistent board layout
- interlinked drill-down paths
Feasible GitHub data sources
GraphQL: contributor activity and collaboration
GitHub GraphQL's ContributionsCollection is the most important source for this initiative. It exposes:
contributionCalendar
commitContributionsByRepository
pullRequestContributions
pullRequestContributionsByRepository
- contributor-scoped ranges via
from / to
User.repositoriesContributedTo also helps answer "where does this person contribute?" across repos.
Why this matters: contributor detail boards should not be approximated from repo commits alone. The product needs user-centric history, repo mix, and contribution type mix.
GitHub REST fills several gaps:
- repository metadata, stars, forks, releases, issues, PRs, workflow runs
- traffic endpoints for:
- page views
- clones
- top referrers
- top paths
The traffic endpoints are explicitly limited to the last 14 days and require write access. That means long-term traffic charts are only possible if the workflow snapshots those endpoints on a schedule and stores history.
Practical boundary for v1
Recommended v1 scope:
- public repositories
- public-safe metrics only
- scheduled daily or twice-daily refresh
- no external database
- git-backed history snapshots for metrics GitHub does not retain long enough
This keeps the add-on easy to adopt and compatible with GitHub Pages.
Hard constraints and design implications
1. Static hosting means data must be precomputed
GitHub Pages serves static assets. There is no request-time API proxy or secure server-side join layer. So the site must be built from materialized snapshots.
Implication:
- do metric joins and enrichment in Actions
- output CSV or Parquet for Dataface
- run
dft build --mode html --bake-data
2. Traffic metrics need durable snapshots
GitHub REST traffic APIs only retain 14 days.
Implication:
- maintain a machine-managed history store
- easiest v1 path is a repo-managed history branch or dedicated data directory updated by Actions
3. Contributor drill-through depends on stable identity
Contributor boards need a stable key. GitHub usernames work well for public repos, but there are still edge cases:
- renamed accounts
- bot/service accounts
- contributors with activity split across repos or organizations
Implication:
- use GitHub login as the v1 identity key
- explicitly support bot exclusion / allowlists
4. Some developer productivity metrics are expensive or ambiguous
Commercial tools popularize metrics like review time, pickup time, cycle time, deploy frequency, and change failure rate. Some are feasible, some are not trivial for a static GitHub-only pack.
Implication:
- prefer metrics with clear GitHub-native semantics in v1
- avoid fake precision for DORA-like metrics unless the underlying event boundaries are reliable
Competitive landscape
Category 1: GitHub-native stats cards and static pages
| Tool |
What it does |
Takeaway |
github-statistics / similar README-card projects |
User-centric contribution cards via GraphQL |
Strong proof that lightweight scheduled GitHub stats are feasible, but the UX is too shallow for project-level analytics |
github-repo-traffic-stats and related traffic collectors |
Snapshot GitHub traffic beyond the 14-day window and publish via GitHub Pages |
Strong validation for the "capture history in Actions, publish static dashboards" model |
| GitHub stats/profile sites |
Embeddable cards and vanity summaries |
Good for social sharing, weak for maintainers who need board-to-board navigation |
Category 2: OSS health analytics frameworks
| Tool |
What it does |
Takeaway |
| CHAOSS / GrimoireLab |
Broad OSS community analytics across many systems |
Best source for metric taxonomy and board ideas; too heavy for a drop-in single-repo add-on |
| Augur |
Open-source health and sustainability metrics platform |
Confirms demand for contributor retention, review flow, and sustainability signals, but involves much more infrastructure than GitHub Pages |
| DevPulse |
Community health analytics for GitHub orgs and repos |
Good signal that OSS maintainers want project health dashboards, not just vanity charts |
Category 3: Commercial engineering intelligence
| Tool |
What it does |
Takeaway |
| LinearB |
Benchmarks for review time, PR size, cycle time, deploy frequency |
Useful metric vocabulary, but much of it assumes richer engineering system integrations |
| Athenian / Plandek / similar |
Cross-tool engineering analytics, delivery health, team performance |
Useful framing for productivity dashboards; too heavyweight for a GitHub-only v1 |
Competitive conclusion
The strongest whitespace is not "enterprise productivity platform in GitHub Pages." It is:
- open-source project visibility
- maintainer-friendly dashboard pack
- easy setup
- public, linkable pages
- contributor-centric drill-through
That points away from a huge platform and toward a curated board suite with strong defaults.
Recommended dashboard suite
The suite should feel like a small analytics product, not a pile of disconnected charts.
1. Project overview
Purpose:
- answer "what is happening in this project overall?"
Candidate sections:
- contribution volume trend
- merged PRs / opened PRs / closed issues over time
- active contributors
- review backlog
- release cadence
- stars / forks / watchers trend if historical capture is available
- traffic summary if write access exists
Primary clicks:
- to repository detail
- to contributor leaderboard
- to review/collaboration board
2. Contribution flow
Purpose:
- answer "how work moves through PR and issue flow"
Candidate sections:
- PR opened vs merged trend
- median time to first review
- median time open to merge
- PR size distribution
- stale PR and issue backlog
- issue close time bands
Primary clicks:
- to contributor detail for PR authors and reviewers
- to repository detail
3. Review and collaboration
Purpose:
- answer "who reviews, who is overloaded, and where collaboration is concentrated"
Candidate sections:
- review requests by contributor
- completed reviews by contributor
- comments / review discussion intensity
- reviewer-author interaction matrix
- first-response lag distribution
Primary clicks:
- from any person row to contributor detail
4. Contributor directory / leaderboard
Purpose:
- give a navigable landing page for people
Candidate sections:
- contributor table with commits, PRs, issues, reviews, merged PRs, repositories contributed to
- activity trend sparkline
- current status buckets: active, cooling off, newly active
Primary clicks:
- every contributor name links to the contributor detail board
5. Contributor detail
Purpose:
- answer "what does this specific person contribute, where, and how recently?"
Candidate sections:
- profile header KPIs
- contribution calendar / recent activity trend
- contribution mix by type
- repositories contributed to
- authored PRs
- submitted reviews
- issue participation
- release / milestone involvement if inferable
Primary clicks:
- back to overview
- to repository detail filtered to that person's activity
6. Repository or area detail
Purpose:
- answer "which repos, packages, or subprojects are healthy vs overloaded?"
Candidate sections:
- repo-level throughput
- active contributors
- review lag
- issue backlog
- release cadence
- traffic if available
Primary clicks:
- to contributor detail for top contributors and reviewers
7. Reach and adoption
Purpose:
- answer "is the project being discovered and consumed?"
Candidate sections:
- views and clones trend
- top referrers
- top pages / docs paths
- stars / forks / watchers trend
- release download or package signals if available from outside GitHub in a later phase
Primary clicks:
Board recommendation for v1
V1 should likely ship five boards:
- Project overview
- Contribution flow
- Review and collaboration
- Contributor directory
- Contributor detail
Repository detail and reach/adoption can be v1.1 if time or data complexity becomes a risk.
Inter-linking model
The dashboard pack should behave like a small documentation site:
- overview -> flow
- overview -> contributors
- contributors -> contributor detail
- flow tables -> contributor detail
- review tables -> contributor detail
- contributor detail -> repository detail or filtered overview slices
Contributor drill-through rule
Every board that names a person should link with the same key:
contributor=<github_login>
That keeps URL semantics simple and lets the contributor detail board become the canonical "person" destination.
Metric families to prioritize
Safe and high-value for v1
- PRs opened / merged / closed
- issues opened / closed
- commits
- active contributors
- reviews submitted
- repositories contributed to
- time to first review
- time open to merge
- PR size
- stale PR / issue counts
- views / clones / referrers / popular paths
Promising but likely phase 2
- contributor retention cohorts
- first-time contributor conversion
- maintainer load / bus-factor style views
- release note coverage
- comment sentiment or discussion quality
- deploy frequency / failure rate unless another system is integrated
Recommended packaging model
The easiest maintainable pattern is:
- a starter dashboard pack in-repo
- a scheduled GitHub Actions workflow
- a machine-managed history snapshot store
dft build output deployed to GitHub Pages
This is easier to reason about than a hosted service and more durable than README cards.
Major risks
API rate and query complexity
Cross-repo contributor dashboards can become expensive if the workflow naively walks every PR, review, and issue every run.
Mitigation:
- incremental snapshots
- bounded lookback windows
- materialized history tables
Identity quality
Contributor identity is cleaner on GitHub than in multi-system OSS analytics, but bot noise and renamed users still exist.
Mitigation:
- config-driven exclusions
- explicit contributor dimension tables
Scope creep into enterprise productivity
It is easy to drift from OSS visibility into a full developer productivity platform.
Mitigation:
- keep v1 focused on GitHub-native data
- favor public OSS use cases over internal SDLC governance
Recommendation
Proceed with a dashboard-factory M4 initiative focused on a static GitHub OSS analytics pack.
The strongest product shape is:
- Dataface dashboards, not README cards
- GitHub Actions snapshot pipeline, not a hosted backend
- GitHub Pages deployment, not a custom app
- contributor detail as the central drill-through surface
- scope tuned for OSS maintainers, not enterprise management dashboards
References