Implement two-tier task manager escalation watchdog and optional manager
Problem
Running a permanent Claude “manager” in tmux is optional for most work: heartbeat already dispatches ready tasks and writes snapshots. Operators still want clarity on when a human or high-context agent should intervene. Today there is no first-class escalation path from dumb automation to manager.
Context
- Tier 1 (default):
scripts/task-manager-heartbeat,task-manager-run,scripts/dispatch,tasks/logs/task_manager/*.snapshot.json, register JSON, dispatch logs undertasks/logs/dispatch-*.log. - Tier 2 (escalation): Optional tmux manager (
just manage/ Jared) or a one-shotclaudesession with a bundled context pack, only when signals fire. - Related exploration: upgrade-jared-task-manager-visibility-orchestration-and-operator-ux.
Possible Solutions
- A — Signal-only (docs + snapshot fields): Define escalation signals and surface them in
snapshot.json/ heartbeat text; operator spawns manager manually. Lowest code, less automation. - B — Watchdog flags + notification: Tier 1 sets
needs_escalation(or extendsneeds_attention) with reasons; optional desktop notification or log line; still manual manager. - C — Automated manager spawn (opt-in): On critical signals, run a configured command (e.g. open tmux window or
claudewith prompt file). Recommended direction only if guarded (rate limit, dry-run mode) to avoid spawn storms.
Tag B as the pragmatic first pass unless product explicitly wants C.
Plan
- [x] Document canonical escalation signals (e.g. repeated dispatch non-zero for same slug,
in_progressstale by task mtime /started_at, dispatch log idle, register/worktree mismatch). - [x] Implement detection in
task_manager_lib.py/task-manager-heartbeat(or small helper module); append structured reasons to snapshot payload and text summary. - [x] Make optional tmux manager start path clearly non-default or triggered only when
TASK_MANAGER_ESCALATION=1/ config flag (exact mechanism TBD). - [x] Add tests under
tests/scripts/test_task_manager_scripts.pyfor at least one escalation signal. - [x] Update
.codex/skills/task-manager/SKILL.md(or AGENTS) to describe two-tier operator model.
Implementation Progress
- Added structured escalation signaling in
scripts/task_manager_lib.py: - per-task
escalation_reasons[]andneeds_attention[]in snapshot tasks - top-level
escalation.required,escalation.signals[],counts.escalation_signals - text heartbeat now includes an
Escalation signalssection when active - Implemented minimal watchdog detections for first pass:
dispatch_failed/dispatch_interrupted/dispatch_stalledstuck_in_progress(task isin_progressand stale by idle threshold)- retained
worker_idle/pickup_overdueas structured reason codes - Added merge-conflict prevention for
started_atchurn: scripts/task-manager-heartbeat --dispatch-readyno longer edits task frontmatter by default- optional opt-in remains via
--mark-started-on-dispatchorTASK_MANAGER_MARK_STARTED_ON_DISPATCH=1 - Added/updated tests in
tests/scripts/test_task_manager_scripts.py: - structured escalation snapshot + text assertions
- default dispatch no longer mutates task status/frontmatter
- explicit opt-in path still marks task started
- Follow-on merge context:
- PR #705 conflict resolution kept this task's escalation schema and
--mark-started-on-dispatchopt-in semantics while adding PR-CI probe +.worktreesdefaults. - Task-manager heartbeat now reflects both layers on
main.
QA Exploration
- N/A (non-UI infra/scripts task)
- [x] QA exploration completed (or N/A for non-UI tasks)
Review Feedback
- Review rounds completed via
just review; follow-up fixes applied before merge. - [x] Review cleared