Plan activity page performance improvements with incremental GitHub sync and caching
Problem
Create an implementation plan to reduce tasks activity page load time, currently around 10s. Investigate bottlenecks in GitHub data fetch and rendering, and design a smart strategy that avoids full-history queries on every load. Candidate approach: persist normalized activity history locally, fetch only new deltas since last cursor or timestamp, and serve page data from cached snapshots with background refresh and explicit staleness indicators. The plan must also answer whether the activity UI should paginate by time window (for example one week at a time by default) and how cached activity should be stored on disk so it remains bounded, inspectable, and cheap to update. Define measurable targets, instrumentation, rollout steps, and fallback behavior when GitHub is unavailable.
Context
/tasks/activityis currently useful but slow enough to break normal operator flow.- The activity page (
tasks/activity/index.md) renders a day-by-day record of completed tasks and merged PRs via the{{ daily_activity_rollup() }}MkDocs macro, which callsrender_daily_activity()intasks/activity.py. - The page appears to be doing too much work per request, likely combining GitHub fetch, normalization, and render time.
- A cache design that only speeds up fetch but still renders an unbounded full-history page may leave the page feeling slow or visually overloaded.
- The plan should prefer a bounded, inspectable local format over a single ever-growing opaque blob.
Current data flow
collect_merged_prs()(around line 365) runsgit log --first-parent --max-count=5000to get merge commits, then for each PR calls_changed_paths_for_commit(), which spawns agit diff-treesubprocess. With a few hundred merged PRs this becomes a few hundred subprocess calls just for path detection._pull_request_title()(around line 138) may spawn an additionalgit showsubprocess per merge-commit PR to extract the real title from the commit body.collect_completed_tasks()(around line 513) reads every task file inworkstreams/*/tasks/*.md, then for each completed task withoutcompleted_atfrontmatter calls_blame_completed_line(), which spawnsgit blame --line-porcelainand may also call_commit_committer_name()with anothergit show.- No caching means every MkDocs build, including
mkdocs servelive reload, reruns the full pipeline from scratch.
Key files
tasks/activity.py- collection and rendering logictasks/activity/index.md- page templatetasks/macros.py- macro registration fordaily_activity_rollup
Constraints
- MkDocs macro context runs during static site build, not a Django request.
- Git subprocess calls are the dominant cost; task file parsing is comparatively cheap.
- The page is read-only, so caching is operationally safe.
- The plan should explicitly compare storage shapes such as:
- one normalized append-only activity log plus derived indexes
- time-bucketed cache files such as weekly JSON
- a small embedded database or SQLite table if materially cleaner
- The final design must remain correct after new PRs merge or tasks complete.
Possible Solutions
- Recommended: combine incremental GitHub sync with bounded weekly pagination in the UI. Keep a normalized local activity store, fetch deltas only, precompute week buckets or equivalent indexes, and render one recent week by default with explicit previous/next navigation. This reduces both request-time fetch work and page weight while keeping the model simple.
- Batch git operations without a bounded cache. This removes the worst subprocess hot spot, but warm loads still recompute everything and the page can still grow into an unbounded render.
- Keep the full unpaginated page but cache the whole payload. Faster data fetch, but page size and visual noise continue to grow with history.
- Use infinite scroll over a local cache. Potentially workable, but adds UI complexity and makes explicit time-window navigation less clear.
- Use a single large JSON or Markdown cache file. Simple initially, but likely degrades as the file grows and becomes harder to inspect or refresh incrementally.
Plan
- Identify current hot spots separately:
- Git history/query time
- normalization and aggregation time
- template/render time
- Use batched git reads to remove per-PR subprocess calls where possible:
- update
collect_merged_prs()to consumegit log --name-onlystyle output instead of calling_changed_paths_for_commit()per PR - avoid extra
git showtitle lookups when the log body already contains enough information - Decide the canonical local activity store and justify it with bounded-growth rules:
- define the on-disk format
- define invalidation or refresh keys
- keep storage inspectable and cheap to update
- Decide the activity page windowing model:
- default recent week view
- previous/next week navigation
- optional broader views only if they remain cheap
- Define freshness semantics:
- what is loaded synchronously on page build
- what is refreshed incrementally
- what staleness timestamp or badge the page shows
- Capture implementation touch points:
tasks/activity.pytasks/activity/index.mdtasks/macros.pytests/tasks/test_activity.py.gitignoreonly if a local cache path needs ignoring- Include rollout guidance so the page can switch to the faster path without losing historical correctness.
Implementation Progress
- 2026-03-24: Scope clarified to require a recommendation on time-window pagination, likely weekly, and on-disk cache or storage shape, not just generic incremental sync.
- 2026-03-24: Analysis completed for the current
tasks/activity.pydata flow. Confirmed the main hot spot is per-PR subprocess fan-out from_changed_paths_for_commit(), with secondary cost from blame-based task completion backfills and no caching layer for repeated MkDocs builds. - 2026-03-24: Recommended direction is to combine batched git collection with bounded local caching and a weekly time-windowed activity view, preserving the richer framing from
mainwhile keeping the concrete findings from the original planning work.
QA Exploration
N/A - this is a planning task, not an implementation or UI validation task.
- [x] QA exploration completed (or N/A for non-UI tasks)
Review Feedback
- [ ] Review cleared