tasks/workstreams/dashboard-factory/initiatives/github-oss-activity-dashboards/research.md

Research

Why this initiative exists

GitHub already exposes useful but fragmented activity views:

That creates a clear opportunity for Dataface:

The bar is not "more charts than GitHub." The bar is:

  1. better visibility
  2. better storytelling
  3. better navigation
  4. easy adoption for real OSS maintainers

Research questions

This research focused on four questions:

  1. What can GitHub already show natively, and where are the visibility gaps?
  2. Which GitHub APIs expose enough data to build static dashboards without a live backend?
  3. What do OSS analytics and developer productivity tools already do well?
  4. Which dashboard set would feel immediately useful for open-source projects, especially with contributor drill-through?

What GitHub already shows

Native surfaces

Surface What it shows Gap for this initiative
Pulse Open/merged PRs, open/closed issues, top 15 committers for a period No durable site, limited contributor detail, weak multi-board storytelling
Traffic Views, clones, referrers, popular paths Last 14 days only; no long-term trend unless someone snapshots it
Contributors / graphs Commit history and repo activity views Mostly repo-local; limited cross-repo or role-based narrative
Community profile Docs / governance checklist Good health signal, but not activity or contribution flow analytics

Key takeaway

GitHub's native features are good operational hints, but weak as a public analytics product for a project community. Dataface can win by packaging the same underlying activity into:


Feasible GitHub data sources

GraphQL: contributor activity and collaboration

GitHub GraphQL's ContributionsCollection is the most important source for this initiative. It exposes:

User.repositoriesContributedTo also helps answer "where does this person contribute?" across repos.

Why this matters: contributor detail boards should not be approximated from repo commits alone. The product needs user-centric history, repo mix, and contribution type mix.

REST: repo metadata and traffic

GitHub REST fills several gaps:

The traffic endpoints are explicitly limited to the last 14 days and require write access. That means long-term traffic charts are only possible if the workflow snapshots those endpoints on a schedule and stores history.

Practical boundary for v1

Recommended v1 scope:

This keeps the add-on easy to adopt and compatible with GitHub Pages.


Hard constraints and design implications

1. Static hosting means data must be precomputed

GitHub Pages serves static assets. There is no request-time API proxy or secure server-side join layer. So the site must be built from materialized snapshots.

Implication:

2. Traffic metrics need durable snapshots

GitHub REST traffic APIs only retain 14 days.

Implication:

3. Contributor drill-through depends on stable identity

Contributor boards need a stable key. GitHub usernames work well for public repos, but there are still edge cases:

Implication:

4. Some developer productivity metrics are expensive or ambiguous

Commercial tools popularize metrics like review time, pickup time, cycle time, deploy frequency, and change failure rate. Some are feasible, some are not trivial for a static GitHub-only pack.

Implication:


Competitive landscape

Category 1: GitHub-native stats cards and static pages

Tool What it does Takeaway
github-statistics / similar README-card projects User-centric contribution cards via GraphQL Strong proof that lightweight scheduled GitHub stats are feasible, but the UX is too shallow for project-level analytics
github-repo-traffic-stats and related traffic collectors Snapshot GitHub traffic beyond the 14-day window and publish via GitHub Pages Strong validation for the "capture history in Actions, publish static dashboards" model
GitHub stats/profile sites Embeddable cards and vanity summaries Good for social sharing, weak for maintainers who need board-to-board navigation

Category 2: OSS health analytics frameworks

Tool What it does Takeaway
CHAOSS / GrimoireLab Broad OSS community analytics across many systems Best source for metric taxonomy and board ideas; too heavy for a drop-in single-repo add-on
Augur Open-source health and sustainability metrics platform Confirms demand for contributor retention, review flow, and sustainability signals, but involves much more infrastructure than GitHub Pages
DevPulse Community health analytics for GitHub orgs and repos Good signal that OSS maintainers want project health dashboards, not just vanity charts

Category 3: Commercial engineering intelligence

Tool What it does Takeaway
LinearB Benchmarks for review time, PR size, cycle time, deploy frequency Useful metric vocabulary, but much of it assumes richer engineering system integrations
Athenian / Plandek / similar Cross-tool engineering analytics, delivery health, team performance Useful framing for productivity dashboards; too heavyweight for a GitHub-only v1

Competitive conclusion

The strongest whitespace is not "enterprise productivity platform in GitHub Pages." It is:

That points away from a huge platform and toward a curated board suite with strong defaults.


The suite should feel like a small analytics product, not a pile of disconnected charts.

1. Project overview

Purpose:

Candidate sections:

Primary clicks:

2. Contribution flow

Purpose:

Candidate sections:

Primary clicks:

3. Review and collaboration

Purpose:

Candidate sections:

Primary clicks:

4. Contributor directory / leaderboard

Purpose:

Candidate sections:

Primary clicks:

5. Contributor detail

Purpose:

Candidate sections:

Primary clicks:

6. Repository or area detail

Purpose:

Candidate sections:

Primary clicks:

7. Reach and adoption

Purpose:

Candidate sections:

Primary clicks:

Board recommendation for v1

V1 should likely ship five boards:

  1. Project overview
  2. Contribution flow
  3. Review and collaboration
  4. Contributor directory
  5. Contributor detail

Repository detail and reach/adoption can be v1.1 if time or data complexity becomes a risk.


Inter-linking model

The dashboard pack should behave like a small documentation site:

Contributor drill-through rule

Every board that names a person should link with the same key:

That keeps URL semantics simple and lets the contributor detail board become the canonical "person" destination.


Metric families to prioritize

Safe and high-value for v1

Promising but likely phase 2


The easiest maintainable pattern is:

  1. a starter dashboard pack in-repo
  2. a scheduled GitHub Actions workflow
  3. a machine-managed history snapshot store
  4. dft build output deployed to GitHub Pages

This is easier to reason about than a hosted service and more durable than README cards.


Major risks

API rate and query complexity

Cross-repo contributor dashboards can become expensive if the workflow naively walks every PR, review, and issue every run.

Mitigation:

Identity quality

Contributor identity is cleaner on GitHub than in multi-system OSS analytics, but bot noise and renamed users still exist.

Mitigation:

Scope creep into enterprise productivity

It is easy to drift from OSS visibility into a full developer productivity platform.

Mitigation:


Recommendation

Proceed with a dashboard-factory M4 initiative focused on a static GitHub OSS analytics pack.

The strongest product shape is:


References