Task Manager Friction Log — 2026-03-24
Scope
Focused on task-manager stale/attention/escalation cases observed in the tasks server status UI and heartbeat output on 2026-03-24.
Incidents
1. Split manager identity (dave vs dave.fowler)
- Symptom:
- tasks server showed stale attention/escalation counts that did not match the latest heartbeat text
- separate register/snapshot files existed for
task-manager-dave.* and task-manager-dave-fowler.*
- Evidence:
.tasks/task_manager/task-manager-dave.snapshot.json
.tasks/task_manager/task-manager-dave-fowler.snapshot.json
.tasks/task_manager/task-manager-dave.register.json
.tasks/task_manager/task-manager-dave-fowler.register.json
- Root cause:
- owner identity was not canonicalized before snapshot/register path selection
- different runs used different owner strings for the same human/operator
- Effect:
- stale UI
- duplicate or conflicting worker/register state
- tasks appeared stuck when the active heartbeat was actually elsewhere
- Prevention:
- canonicalize manager owner before any file path lookup/write
- make
dave, davefowler, and dave.fowler collapse to one canonical manager key
- surface current snapshot age/owner clearly in the UI
2. Ready task dispatched in worktree, but root task file never reconciled
- Symptom:
- task still showed
ready in the main repo while the worktree task file had in_progress, PR metadata, and completed planning content
- Evidence:
plan-activity-page-performance-improvements-with-incremental-github-sync-and-caching
- worktree file had
status: in_progress, pr_url, pr_number, started_at, pr_created_at
- root file still showed
status: ready with no PR metadata
- dispatch log ended successfully with
dispatch_exit_code=0
- Root cause:
- worker changed the task file inside the task worktree only
- manager had no mandatory reconciliation step to sync task metadata back into the root repo after successful worker completion
- Effect:
- false
pickup_overdue
- status/queue drift
- duplicate work risk
- Prevention:
- add explicit post-dispatch reconciliation:
- copy frontmatter/task updates from worktree task file back to root
- or make the task file authoritative only inside the task worktree and merge that branch immediately after PR creation/update
- heartbeat should detect a registered worktree with newer task metadata than root and emit
metadata_drift
3. Task implementation/review completed, but task left in_progress
- Symptom:
- task showed
stuck_in_progress even though worker log showed implementation and review-response progress
- Evidence:
wire-markdown-svg-to-dataface-font-config-and-vendored-inter
- dispatch log shows implementation completed, review feedback addressed, no final root metadata update
- Root cause:
- no strict end-of-run completion handshake from worker to manager
- task can do code work but stop before:
- marking task completed
- setting PR metadata
- clearing register state
- Effect:
- false stall signal after real progress
- manager/operator cannot trust
in_progress
- Prevention:
- require dispatch runner to finish in one of a small set of terminal states:
completed_with_pr
blocked
needs_review_fix
failed
- reject silent success with no task-state mutation
- add automatic
stale_in_progress_reconcile check against:
- open PR on branch
- fresh commits in worktree
- review artifact presence
4. worker_gone and retry noise for a single task
- Symptom:
- one task surfaced multiple signals:
pickup_overdue
dispatch_worker_gone
dispatch_retry_cooldown
- Evidence:
make-tasks-server-status-page-use-fresh-heartbeat-data-and-auto-refresh
- Root cause:
- dispatch process disappeared before manager recorded a clean terminal outcome
- raw escalation list emitted each signal independently
- Effect:
- UI felt like 3 separate incidents when it was 1 task lifecycle problem
- Prevention:
- group signals by task in operator UI
- preserve raw signals only in detail/debug view
- add a dispatch tombstone file with last known state so
worker_gone can be distinguished from completed_but_unreconciled
5. Register cleanup drift
- Symptom:
- completed tasks still appeared under active/register state
- Evidence:
- heartbeat text showed completed tasks with
registered=yes
- register JSON retained older entries long after completion
- Root cause:
- register entries not pruned reliably after task completion/merge
- Effect:
- active count inflated
- old worktrees looked live
- Prevention:
- prune register entries automatically when:
- task status is completed in root
- PR is merged
- worktree is missing
- add
register_orphan cleanup pass in heartbeat
6. Over-broad owner routing
- Symptom:
- Dave-managed queue picked up role-owned tasks that were not intended for Dave
- Evidence:
owner_matches() in scripts/task_manager_lib.py treated Dave as “anything not RJ”
- Root cause:
- owner routing was exclusion-based instead of explicit ownership-based
- Effect:
- wrong tasks entered Dave queue
- role/persona tasks could be auto-started unintentionally
- Prevention:
- route by explicit human ownership or role-owner mapping
- unknown/unmapped role owners must be non-dispatchable by default
Recommended Fix Order
- Canonicalize manager owner identity (
dave vs dave.fowler)
- Add explicit role-owner routing
- Add root/worktree metadata reconciliation after dispatch completion
- Add terminal-state handshake for workers
- Prune register drift automatically
- Group UI attention/escalation by task rather than raw signals
New Guardrails Worth Adding
- Snapshot freshness indicator in
/status
metadata_drift signal when root task file lags behind worktree task file
register_orphan signal when register points to missing/completed worktree
dispatch_completed_unreconciled signal when worker exits 0 but task remains ready/in_progress
- one canonical operator incident view: one row per task, expandable raw signals