Research

Why this initiative exists

GitHub already exposes useful but fragmented activity views:

Pulse summarizes open and merged PRs, issues, and top committers, but only as a narrow repo-level view with limited drill-down.
Traffic exposes views, clones, referrers, and popular paths, but only for the last 14 days and only for users with write access.
Insights / graphs remain mostly repo-local and are not shaped as a reusable, public-facing multi-dashboard site.

That creates a clear opportunity for Dataface:

turn GitHub activity into a static, linkable dashboard site
give maintainers a cross-repo and cross-contributor view
preserve longer-term history than GitHub's native traffic window
publish a polished artifact on GitHub Pages with no separate backend

The bar is not "more charts than GitHub." The bar is:

better visibility
better storytelling
better navigation
easy adoption for real OSS maintainers

Research questions

This research focused on four questions:

What can GitHub already show natively, and where are the visibility gaps?
Which GitHub APIs expose enough data to build static dashboards without a live backend?
What do OSS analytics and developer productivity tools already do well?
Which dashboard set would feel immediately useful for open-source projects, especially with contributor drill-through?

What GitHub already shows

Native surfaces

Surface	What it shows	Gap for this initiative
Pulse	Open/merged PRs, open/closed issues, top 15 committers for a period	No durable site, limited contributor detail, weak multi-board storytelling
Traffic	Views, clones, referrers, popular paths	Last 14 days only; no long-term trend unless someone snapshots it
Contributors / graphs	Commit history and repo activity views	Mostly repo-local; limited cross-repo or role-based narrative
Community profile	Docs / governance checklist	Good health signal, but not activity or contribution flow analytics

Key takeaway

GitHub's native features are good operational hints, but weak as a public analytics product for a project community. Dataface can win by packaging the same underlying activity into:

more durable history
clearer metric families
consistent board layout
interlinked drill-down paths

Feasible GitHub data sources

GraphQL: contributor activity and collaboration

GitHub GraphQL's ContributionsCollection is the most important source for this initiative. It exposes:

contributionCalendar
commitContributionsByRepository
pullRequestContributions
pullRequestContributionsByRepository
contributor-scoped ranges via from / to

User.repositoriesContributedTo also helps answer "where does this person contribute?" across repos.

Why this matters: contributor detail boards should not be approximated from repo commits alone. The product needs user-centric history, repo mix, and contribution type mix.

REST: repo metadata and traffic

GitHub REST fills several gaps:

repository metadata, stars, forks, releases, issues, PRs, workflow runs
traffic endpoints for:
page views
clones
top referrers
top paths

The traffic endpoints are explicitly limited to the last 14 days and require write access. That means long-term traffic charts are only possible if the workflow snapshots those endpoints on a schedule and stores history.

Practical boundary for v1

Recommended v1 scope:

public repositories
public-safe metrics only
scheduled daily or twice-daily refresh
no external database
git-backed history snapshots for metrics GitHub does not retain long enough

This keeps the add-on easy to adopt and compatible with GitHub Pages.

Hard constraints and design implications

1. Static hosting means data must be precomputed

GitHub Pages serves static assets. There is no request-time API proxy or secure server-side join layer. So the site must be built from materialized snapshots.

Implication:

do metric joins and enrichment in Actions
output CSV or Parquet for Dataface
run dft build --mode html --bake-data

2. Traffic metrics need durable snapshots

GitHub REST traffic APIs only retain 14 days.

Implication:

maintain a machine-managed history store
easiest v1 path is a repo-managed history branch or dedicated data directory updated by Actions

3. Contributor drill-through depends on stable identity

Contributor boards need a stable key. GitHub usernames work well for public repos, but there are still edge cases:

renamed accounts
bot/service accounts
contributors with activity split across repos or organizations

Implication:

use GitHub login as the v1 identity key
explicitly support bot exclusion / allowlists

4. Some developer productivity metrics are expensive or ambiguous

Commercial tools popularize metrics like review time, pickup time, cycle time, deploy frequency, and change failure rate. Some are feasible, some are not trivial for a static GitHub-only pack.

Implication:

prefer metrics with clear GitHub-native semantics in v1
avoid fake precision for DORA-like metrics unless the underlying event boundaries are reliable

Competitive landscape

Category 1: GitHub-native stats cards and static pages

Tool	What it does	Takeaway
`github-statistics` / similar README-card projects	User-centric contribution cards via GraphQL	Strong proof that lightweight scheduled GitHub stats are feasible, but the UX is too shallow for project-level analytics
`github-repo-traffic-stats` and related traffic collectors	Snapshot GitHub traffic beyond the 14-day window and publish via GitHub Pages	Strong validation for the "capture history in Actions, publish static dashboards" model
GitHub stats/profile sites	Embeddable cards and vanity summaries	Good for social sharing, weak for maintainers who need board-to-board navigation

Category 2: OSS health analytics frameworks

Tool	What it does	Takeaway
CHAOSS / GrimoireLab	Broad OSS community analytics across many systems	Best source for metric taxonomy and board ideas; too heavy for a drop-in single-repo add-on
Augur	Open-source health and sustainability metrics platform	Confirms demand for contributor retention, review flow, and sustainability signals, but involves much more infrastructure than GitHub Pages
DevPulse	Community health analytics for GitHub orgs and repos	Good signal that OSS maintainers want project health dashboards, not just vanity charts

Category 3: Commercial engineering intelligence

Tool	What it does	Takeaway
LinearB	Benchmarks for review time, PR size, cycle time, deploy frequency	Useful metric vocabulary, but much of it assumes richer engineering system integrations
Athenian / Plandek / similar	Cross-tool engineering analytics, delivery health, team performance	Useful framing for productivity dashboards; too heavyweight for a GitHub-only v1

Competitive conclusion

The strongest whitespace is not "enterprise productivity platform in GitHub Pages." It is:

open-source project visibility
maintainer-friendly dashboard pack
easy setup
public, linkable pages
contributor-centric drill-through

That points away from a huge platform and toward a curated board suite with strong defaults.

Recommended dashboard suite

The suite should feel like a small analytics product, not a pile of disconnected charts.

1. Project overview

Purpose:

answer "what is happening in this project overall?"

Candidate sections:

contribution volume trend
merged PRs / opened PRs / closed issues over time
active contributors
review backlog
release cadence
stars / forks / watchers trend if historical capture is available
traffic summary if write access exists

Primary clicks:

to repository detail
to contributor leaderboard
to review/collaboration board

2. Contribution flow

Purpose:

answer "how work moves through PR and issue flow"

Candidate sections:

PR opened vs merged trend
median time to first review
median time open to merge
PR size distribution
stale PR and issue backlog
issue close time bands

Primary clicks:

to contributor detail for PR authors and reviewers
to repository detail

3. Review and collaboration

Purpose:

answer "who reviews, who is overloaded, and where collaboration is concentrated"

Candidate sections:

review requests by contributor
completed reviews by contributor
comments / review discussion intensity
reviewer-author interaction matrix
first-response lag distribution

Primary clicks:

from any person row to contributor detail

4. Contributor directory / leaderboard

Purpose:

give a navigable landing page for people

Candidate sections:

contributor table with commits, PRs, issues, reviews, merged PRs, repositories contributed to
activity trend sparkline
current status buckets: active, cooling off, newly active

Primary clicks:

every contributor name links to the contributor detail board

5. Contributor detail

Purpose:

answer "what does this specific person contribute, where, and how recently?"

Candidate sections:

profile header KPIs
contribution calendar / recent activity trend
contribution mix by type
repositories contributed to
authored PRs
submitted reviews
issue participation
release / milestone involvement if inferable

Primary clicks:

back to overview
to repository detail filtered to that person's activity

6. Repository or area detail

Purpose:

answer "which repos, packages, or subprojects are healthy vs overloaded?"

Candidate sections:

repo-level throughput
active contributors
review lag
issue backlog
release cadence
traffic if available

Primary clicks:

to contributor detail for top contributors and reviewers

7. Reach and adoption

Purpose:

answer "is the project being discovered and consumed?"

Candidate sections:

views and clones trend
top referrers
top pages / docs paths
stars / forks / watchers trend
release download or package signals if available from outside GitHub in a later phase

Primary clicks:

to repository detail

Board recommendation for v1

V1 should likely ship five boards:

Project overview
Contribution flow
Review and collaboration
Contributor directory
Contributor detail

Repository detail and reach/adoption can be v1.1 if time or data complexity becomes a risk.

Inter-linking model

The dashboard pack should behave like a small documentation site:

overview -> flow
overview -> contributors
contributors -> contributor detail
flow tables -> contributor detail
review tables -> contributor detail
contributor detail -> repository detail or filtered overview slices

Contributor drill-through rule

Every board that names a person should link with the same key:

contributor=<github_login>

That keeps URL semantics simple and lets the contributor detail board become the canonical "person" destination.

Metric families to prioritize

Safe and high-value for v1

PRs opened / merged / closed
issues opened / closed
commits
active contributors
reviews submitted
repositories contributed to
time to first review
time open to merge
PR size
stale PR / issue counts
views / clones / referrers / popular paths

Promising but likely phase 2

contributor retention cohorts
first-time contributor conversion
maintainer load / bus-factor style views
release note coverage
comment sentiment or discussion quality
deploy frequency / failure rate unless another system is integrated

Recommended packaging model

The easiest maintainable pattern is:

a starter dashboard pack in-repo
a scheduled GitHub Actions workflow
a machine-managed history snapshot store
dft build output deployed to GitHub Pages

This is easier to reason about than a hosted service and more durable than README cards.

Major risks

API rate and query complexity

Cross-repo contributor dashboards can become expensive if the workflow naively walks every PR, review, and issue every run.

Mitigation:

incremental snapshots
bounded lookback windows
materialized history tables

Identity quality

Contributor identity is cleaner on GitHub than in multi-system OSS analytics, but bot noise and renamed users still exist.

Mitigation:

config-driven exclusions
explicit contributor dimension tables

Scope creep into enterprise productivity

It is easy to drift from OSS visibility into a full developer productivity platform.

Mitigation:

keep v1 focused on GitHub-native data
favor public OSS use cases over internal SDLC governance

Recommendation

Proceed with a dashboard-factory M4 initiative focused on a static GitHub OSS analytics pack.

The strongest product shape is:

Dataface dashboards, not README cards
GitHub Actions snapshot pipeline, not a hosted backend
GitHub Pages deployment, not a custom app
contributor detail as the central drill-through surface
scope tuned for OSS maintainers, not enterprise management dashboards

References

GitHub Docs: Pulse
GitHub Docs: Traffic view
GitHub Docs: Custom workflows with GitHub Pages
GitHub Docs: GraphQL objects
GitHub Docs: REST traffic endpoints
GrimoireLab: project site
Augur: repository
GitHub repo traffic history pattern: piebro/github-repo-traffic-stats
GitHub stats card pattern: tanjeffreyz/github-statistics
LinearB benchmarks: engineering metrics benchmarks

tasks/workstreams/dashboard-factory/initiatives/github-oss-activity-dashboards/research.md

Research

Why this initiative exists

Research questions

What GitHub already shows

Native surfaces

Key takeaway

Feasible GitHub data sources

GraphQL: contributor activity and collaboration

REST: repo metadata and traffic

Practical boundary for v1

Hard constraints and design implications

1. Static hosting means data must be precomputed

2. Traffic metrics need durable snapshots

3. Contributor drill-through depends on stable identity

4. Some developer productivity metrics are expensive or ambiguous

Competitive landscape

Category 1: GitHub-native stats cards and static pages

Category 2: OSS health analytics frameworks

Category 3: Commercial engineering intelligence

Competitive conclusion

Recommended dashboard suite

1. Project overview

2. Contribution flow

3. Review and collaboration

4. Contributor directory / leaderboard

5. Contributor detail

6. Repository or area detail

7. Reach and adoption

Board recommendation for v1

Inter-linking model

Contributor drill-through rule

Metric families to prioritize

Safe and high-value for v1

Promising but likely phase 2

Recommended packaging model

Major risks

API rate and query complexity

Identity quality

Scope creep into enterprise productivity

Recommendation

References