Build tmux task manager orchestration loop and task metadata
Problem
The real problem is that task execution is still too manual. Even after work is planned, the operator still has to start new Codex/Claude threads by hand, remember which tasks are ready, check in on each one, notice when something has stalled, ask for status, decide when to PR, and then clean up the work afterward. That is noisy, easy to forget, and does not scale once multiple tasks are moving at once. The desired operating model is: mark a task ready, and the system picks it up, launches work, monitors it, escalates only when something needs human attention, and records enough state to make the whole flow observable.
The current tasks task system does not support that operating model. It tracks only coarse execution state (not_started, in_progress, completed, etc.) and does not model the queueing and orchestration workflow we want to run. There is no first-class notion of a task being ready for pickup, no dependency metadata to prevent premature execution, and no timing metadata that lets us measure how long execution actually took from start of work to PR handoff. At the orchestration layer, the current repo guidance is centered on one-off scripts/dispatch launches and explicitly says "no tmux", while the desired manager is a long-lived host-side Claude process running under a stable script wrapper, monitoring work continuously and handling most routine task management automatically.
Without these capabilities, the manager has to rediscover work state by hand, cannot safely gate ready tasks on prerequisites, cannot tell the difference between healthy work and stalled work without manual inspection, and cannot produce lightweight operational reporting such as "how long has this task been active?" or "which ready task has been waiting too long without pickup?" The result is unnecessary manual coordination, poor reliability, and weak feedback loops for improving the task system itself. The tasks UI also does not know how to display these richer task states and metadata, so any schema change needs a corresponding UI update rather than stopping at CLI/frontmatter support.
Context
- Current task schema and validation live in
tasks/tools/plans_cli.py. tasksUI/rendering also needs to be updated to display new task fields and statuses. Relevant files likely include:tasks/macros.pytasks/stylesheets/tasks.csstasks/javascripts/tasks.js- task/workstream pages that roll up task status
- Existing task files use YAML frontmatter and are created/updated through the plans CLI; direct edits must preserve the scaffold and pass
just task validate. - Current statuses are hard-coded in
plans_cli.pyasnot_started,in_progress,completed,blocked,cancelled, andplanned. - Completion metadata already exists as
completed_atandcompleted_by, stamped when status becomescompleted. - Current dispatch flow is host-side and already writes per-task log/status artifacts:
scripts/dispatchscripts/dispatch-watchtasks/logs/dispatch-<task-slug>.logtasks/logs/dispatch-<task-slug>.status.json- Current task-manager repo skill at
.codex/skills/task-manager/SKILL.mdexplicitly says "No containers, no tmux", so this work intentionally changes the documented operating model. - Relevant prior CBox manager artifacts exist and should be mined for useful ideas, especially logging and issue capture:
tasks/logs/cbox-task-status-history.mdtasks/logs/cbox-execution-issues.mdlibs/cbox/skills/master-plan-cbox-manager/SKILL.md- The desired manager should stay host-side for now. No container dependency is required for the first version.
- The manager should remain an orchestrator. Task implementation should still happen in isolated worktrees/branches, preserving
1 task = 1 worktree = 1 branch = 1 PR. - Success condition for this task: once a task is marked
ready, the manager can pick it up and carry most of the flow automatically, with human intervention reserved for blockers, policy decisions, or abnormal recovery.
Detailed Requirements
Task metadata
- Add a
readytask status that means "eligible for pickup by the manager". - In v1, task pickup routing is based on
owner. - A manager should be configured with an owner-routing rule and only consider
readytasks that match that rule. - Task
ownerremains task frontmatter. If a task is missingowner, fall back to the milestone/workstream default owner behavior already used by the task CLI. - For the initial manager deployment, treat role-based owners other than
rjas Dave-routable. In practice, Dave's manager should watch all eligiblereadytasks except tasks explicitly owned byrjor another RJ-specific identity. - Do not implement a distributed claim/lock protocol in v1. If duplicate pickup across managers becomes a real problem later, add explicit claiming as a follow-on hardening step.
- Add
dependenciesto task frontmatter as a YAML list. - Dependencies should use task slugs as the canonical reference format in v1. The CLI may accept convenient input forms, but stored frontmatter should be normalized to a YAML list of task slugs.
- A
readytask with incomplete dependencies remains visible as ready in the UI, but the manager must not start it. It should be reported as dependency-blocked. - Add execution timing fields:
started_at: set when execution actually begins on the taskpr_created_at: set when the executor creates the PR- keep existing
completed_atsemantics for full completion/closure
Manager register
- Keep the register minimal and canonical only for manager-local assignment facts.
- The register should live under
tasks/logs/or another clearly manager-owned path in the repo, not in task frontmatter. - The register should contain one entry per active or tracked task with:
- task identifier
- task file path
- worktree path
- registration timestamp
- launch timestamp if different from registration
- Do not store volatile derived state such as "last known task status", "stalled", or "time since update" as canonical register fields unless heartbeat is the explicit single writer for a generated snapshot file.
Heartbeat sources of truth
Each heartbeat should derive state from canonical sources in this order:
- task file frontmatter/body
- manager register
- dispatch status/log files when present
- optional deeper worktree/git checks only when needed
The first version should avoid depending on expensive or noisy probes unless they materially improve decisions.
The candidate task set should first be filtered by matching owner before pickup logic is applied.
Heartbeat outputs
Heartbeat should produce two kinds of output:
- Durable local artifacts every run - append-only history log - latest generated snapshot file
- Manager-facing textual summary for the long-lived manager agent
The local artifacts should be written every heartbeat. The manager-facing summary should be structured enough to support decisions without having to inspect files manually.
Manager-facing heartbeat message
Do not optimize for "minimal output" at the expense of utility. The heartbeat message should always contain enough information for the manager to decide whether to do nothing, pick up a task, recover a task, or escalate.
The default manager-facing heartbeat message should include these sections when non-empty:
Ready to pick up- tasks in
readywith dependencies satisfied and not yet registered Waiting on dependencies- tasks in
readythat are blocked by incomplete dependencies Active tasks- registered tasks currently in flight with concise state
Needs attention- stalled, failed, interrupted, inconsistent, or ambiguous tasks
Recently changed- tasks whose state changed since the last heartbeat
For each task listed in the heartbeat, include at least:
- task identifier or short name
- task file path or slug
- current task status
- dependency state if relevant
- whether it is registered
- worktree path if assigned
- PR state if known
- time since last task-file update
- a one-line recommended manager action when attention is required
Heartbeat emission logic
- Heartbeat local files should be updated every cycle.
- Manager-facing output should be sent every cycle to the manager process, but the content should be decision-oriented and stable:
- if nothing changed and nothing needs attention, emit a short "no action needed" summary rather than a long repeated dump
- if something changed, include the changed tasks plus the current actionable queues
- if something needs attention, surface that section first
- Do not suppress the heartbeat entirely just because nothing changed; the manager should still receive a lightweight proof-of-life summary.
- Also record a machine-readable snapshot with enough information to detect whether the manager itself has stopped heartbeating.
- Add an explicit operator/manager control to pause or stop the heartbeat loop when the system is intentionally blocked or needs manual intervention, so tokens are not wasted during known-bad states.
Stall detection
Task-level stall detection should distinguish at least:
not picked up: task isready, dependencies are satisfied, but it has not been registered within the pickup thresholdidle: registered task has had no meaningful progress signal within the idle thresholdfailed/interrupted: dispatch status/log indicates failure or interruptiondependency blocked: task isreadybut should not start yet
Default v1 thresholds:
- pickup overdue: 10 minutes
- idle task: 20 minutes
- manager stale: no heartbeat snapshot update for 2 heartbeat intervals (default 6 minutes if heartbeat interval is 3 minutes)
Manager-level stall/backlog detection should distinguish at least:
- manager heartbeat missing or stale
- ready tasks accumulating without registration
- tasks remaining in ambiguous state for too long
Manager stale detection should be based on the generated heartbeat snapshot/log timestamp, not on whether messages are visibly arriving in the chat. The system should be able to detect staleness from local artifacts alone.
When manager staleness is detected, v1 should support automatic recovery through the manager wrapper script:
- detect stale/missing heartbeat
- verify whether the underlying manager process/tmux session is absent or unhealthy
- restart or re-ensure the manager when safe
- log that recovery occurred
If automatic recovery cannot determine a safe action, escalate clearly instead of looping.
Deeper status command
Add a separate deeper status command so the manager can inspect a task on demand without bloating every heartbeat. Example inputs:
- task path
- task slug
- worktree path
The deeper command can include:
- git clean/dirty summary
- dispatch-watch classification
- time since latest file change in worktree
- last few relevant log lines
- PR metadata if available
UI behavior
The tasks UI should expose the new task fields clearly enough that an operator can scan the board without opening raw task files.
At minimum the UI should surface:
readyas a distinct visible state- dependency-blocked indication
started_atpr_created_at- any useful rollup/filtering that helps identify:
- ready tasks
- blocked-on-dependency tasks
- active tasks
- tasks that have reached PR stage
Possible Solutions
-
Recommended: extend the existing plans/dispatch model with richer task metadata, a minimal manager register, UI support, and a tmux launcher hidden behind scripts. - Add a new pickup status such as
ready. - Add dependency metadata and normalized lifecycle timestamps in task frontmatter. - Keep using host-side worktrees plusscripts/dispatch/dispatch-watchas the worker execution substrate. - Add a tmux-hosted manager launcher plus a heartbeat script that consumes task/worktree state and maintains a minimal manager register/log. - Add explicit stall-detection and escalation rules for both task execution and the manager itself. - Keep the register authoritative only for manager-local assignment facts such as task path, worktree path, and launch bookkeeping. Derivable status should be recomputed by heartbeat or cached as an explicit snapshot if needed. - Add a targeted status-inspection command for deeper per-task debugging instead of overloading the default heartbeat with every possible diagnostic. - Make the heartbeat change-aware so it emits compact deltas or periodic summaries instead of wasting tokens on "nothing changed" reports every cycle. - Pros: reuses current repo patterns, preserves worktree/PR isolation, minimizes duplicate orchestration code, keeps tmux behind a stable script interface, and gives a straightforward migration path from today's dispatch model. - Cons: requires updating several docs/tests and carefully separating canonical task truth from manager-local runtime state while still providing reliable escalation behavior. -
Build a new manager runtime with its own independent worker/session model and treat the current dispatch scripts as legacy. - Pros: clean-slate design, potentially simpler conceptual model. - Cons: duplicates behavior that already exists in dispatch scripts, increases migration risk, and makes observability/tooling drift more likely.
-
Keep the existing status model and layer all extra state into a standalone manager register without changing task frontmatter. - Pros: smaller schema change. - Cons: task truth becomes split across task files and manager logs, dependency state is harder to inspect from the task itself, and basic task list/show commands remain underpowered.
The recommended path is to make task files richer and keep them as the primary source of truth for queueability and lifecycle state, while the manager register captures runtime assignment and feedback that is inherently manager-local.
Plan
- Extend
tasks/tools/plans_cli.pyand related docs/tests to support: - areadystatus - dependency metadata - lifecycle timestamps for work start and PR handoff - list/show/validate behavior for the new fields - Decide and document the canonical dependency reference format, then implement normalization and validation for it in the task CLI.
- Update
tasksUI rollups and task rendering so the new fields are visible and useful in the planning surface, not just in raw frontmatter. - Decide final field names for execution timing. Preferred default:
-
started_atfor first active execution -pr_created_atfor PR handoff time - keepcompleted_atfor fully completed tasks - Normalize dependencies as YAML list data in frontmatter, even if CLI input accepts comma-separated values for convenience.
- Add a tmux-based manager launch path that starts Claude in dangerous mode as a long-lived host-side orchestrator, but expose it through a stable script such as
scripts/task-manager-startorscripts/task-manager-ensureso operators and docs do not need to speak in raw tmux commands. - Add a minimal manager register file that tracks canonical manager-local assignment data only: - task file path - worktree path - manager launch / registration timestamps - optional task title as a convenience copy if it materially helps readability
- Implement a heartbeat script with a default 3-minute interval that:
- discovers
readytasks owned by the configured manager owner-routing rule and not yet registered - blocks pickup when dependencies are incomplete - inspects registered task/worktree state - writes an append-only history log every run - writes a generated latest-snapshot file every run - emits a manager-facing textual summary every run with explicit sections for:- ready to pick up
- waiting on dependencies
- active tasks
- needs attention
- recently changed
- keeps no-op heartbeats short but still present as proof-of-life
- auto-dispatches immediately when a task is eligible for pickup; no extra waiting/review queue in v1
- Implement explicit stall classification and escalation rules for: - ready but unregistered tasks past the 10-minute pickup threshold - registered but idle tasks past the 20-minute idle threshold - failed/interrupted tasks - manager missed-heartbeat / backlog conditions using the heartbeat snapshot/log timestamps
- Add manager controls for pause/stop/restart behavior:
- explicit way to pause the heartbeat loop
- explicit way to stop the manager cleanly
-
ensureor restart path for stale-manager recovery - Ensure manager configuration includes an explicit owner identity/routing rule and that docs/scripts make this routing model clear. - Dave default routing should include role-owned tasks except RJ-owned tasks - RJ manager routing should include RJ-owned tasks only unless configured otherwise
- Add a targeted status-inspection script/command for on-demand deeper diagnostics, for example by task path or worktree path. Candidate extra details: - git dirty/clean summary - time since worktree file changes - dispatch/worker status when present These are valuable, but they do not have to be part of the default heartbeat if they add latency or noise.
- Decide and codify PR ownership policy for the new loop. Current leaning: - executor creates the PR inside the task worktree - manager supervises, records feedback, and decides what needs intervention
- Update
.codex/skills/task-manager/SKILL.md,.claude/commands/task-dispatch.md, and any related operator docs so the new manager flow is the documented default without requiring every surface to know about tmux internals. - Add focused tests for schema validation, dependency normalization by slug, owner-based pickup routing, timestamp transitions, heartbeat behavior, auto-dispatch, manager-launch command assembly, stale-manager recovery, pause/stop controls, stall detection/escalation, manager-output formatting, and UI rendering/rollup behavior for the new task fields.
Implementation Progress
- 2026-03-18: Task created from the plans CLI after working around a current
current_milestone()bug inplans_cli.py.yaml.safe_load(tasks/mkdocs.yml)currently fails on a Python-tagged MkDocs config value, so task creation required an explicit--milestone m1-ft-analytics-analyst-pilot. - 2026-03-18: Initial design review completed against:
- 2026-03-18: Scope updated to explicitly include
tasksUI support for the new task metadata/statuses. - 2026-03-18: Register-file design adjusted toward a minimal canonical assignment record plus heartbeat-derived snapshots/logging, to avoid duplicating stale status data in multiple places.
- 2026-03-18: Manager ergonomics direction updated to prefer wrapper scripts such as
task-manager-start/task-manager-ensureso the rest of the system does not need to care whether the manager is implemented via tmux internally. - 2026-03-18: Deeper git/worktree diagnostics moved from default-heartbeat requirements to an on-demand status inspection command unless they prove cheap enough to include by default.
- 2026-03-18: Problem framing broadened to cover the actual product goal: reduce manual task-management overhead so marking a task
readyis enough to get it picked up, monitored, and carried toward PR with minimal operator babysitting. - 2026-03-18: Robustness requirements added for task-level stall detection, manager-level missed-heartbeat detection, and low-noise heartbeat reporting that avoids wasting tokens when nothing meaningful changed.
- 2026-03-18: Task spec made more explicit around heartbeat behavior, including sources of truth, durable artifacts, manager-facing output sections, per-task fields to report, and proof-of-life behavior for no-op heartbeats.
- 2026-03-18: Shared distributed claiming was intentionally deferred. V1 pickup routing should use
ownermatching only; duplicate-pickup prevention via pushed claims can be added later if it becomes necessary. - 2026-03-18: Concrete v1 decisions captured:
- dependency format = task slug
- routing = owner-based, with Dave handling non-RJ role-owned tasks
- auto-dispatch immediately when eligible
- idle threshold = 20 minutes
- manager stale threshold = 2 heartbeat intervals
- manager wrapper should support pause/stop/restart semantics
- 2026-03-18: Implemented task schema and CLI changes in
tasks/tools/plans_cli.py: - added
readystatus - normalized
dependenciesto YAML list of task slugs - stamped
started_atonin_progress - added PR metadata support (
pr_created_at,pr_number,pr_url) - fixed
current_milestone()fallback so tagged MkDocs YAML no longer breaks task creation - 2026-03-18: Updated
tasksUI rollups intasks/macros.pyandtasks/stylesheets/tasks.css: - distinct
readystate rendering - dependency-blocked badge
- started / PR metadata badges in milestone task rollups
- 2026-03-18: Added manager scripts under
scripts/: task-manager-heartbeattask-manager-check-statustask-manager-runtask-manager-starttask-manager-ensuretask-manager-pausetask-manager-stop- shared
task_manager_lib.py - 2026-03-18: Updated task-manager docs/commands and
tasks/tools/cli-reference.mdto reflect owner-routedreadypickup, tmux wrapper scripts, and executor-owned PR creation with PR metadata stamping. - 2026-03-18: Validation completed:
uv run pytest tests/core/test_plans_cli.py tests/core/test_tasks_macros.py tests/scripts/test_task_manager_scripts.pyjust task validate tasks/workstreams/infra-tooling/tasks/build-tmux-task-manager-orchestration-loop-and-task-metadata.md- 2026-03-18: Direct script sanity checks completed:
scripts/task-manager-heartbeat --owner dave --format textscripts/task-manager-start --owner dave --dry-runscripts/task-manager-check-status build-tmux-task-manager-orchestration-loop-and-task-metadata --owner dave- 2026-03-18: Initial design review completed against:
tasks/tools/plans_cli.pyscripts/dispatchscripts/dispatch-watch.codex/skills/task-manager/SKILL.md.claude/commands/task-dispatch.md- prior CBox manager notes/logs
- 2026-03-18: PR ownership decision implemented in docs/tooling direction: executor creates the PR from the task worktree, and
scripts/pr-createstamps PR metadata back into the task file.
QA Exploration
N/A. This is orchestration/CLI/task-schema work, not a browser-facing feature.
- [x] QA exploration completed (or N/A for non-UI tasks)
Review Feedback
- [ ] Review cleared
- Review attempt note:
scripts/reviewwas invoked from this branch but did not produce a usable result during this implementation session, so review is not being claimed as complete here.