Build tmux task manager orchestration loop and task metadata

ID	INFRA_TOOLING-BUILD_TMUX_TASK_MANAGER_ORCHESTRATION_LOOP_AND_TASK_METADATA
Status	completed
Priority	p1
Milestone	m1-ft-analytics-analyst-pilot
Owner	sr-engineer-architect
Completed by	dave
Completed	2026-03-18

Problem

The real problem is that task execution is still too manual. Even after work is planned, the operator still has to start new Codex/Claude threads by hand, remember which tasks are ready, check in on each one, notice when something has stalled, ask for status, decide when to PR, and then clean up the work afterward. That is noisy, easy to forget, and does not scale once multiple tasks are moving at once. The desired operating model is: mark a task ready, and the system picks it up, launches work, monitors it, escalates only when something needs human attention, and records enough state to make the whole flow observable.

The current tasks task system does not support that operating model. It tracks only coarse execution state (not_started, in_progress, completed, etc.) and does not model the queueing and orchestration workflow we want to run. There is no first-class notion of a task being ready for pickup, no dependency metadata to prevent premature execution, and no timing metadata that lets us measure how long execution actually took from start of work to PR handoff. At the orchestration layer, the current repo guidance is centered on one-off scripts/dispatch launches and explicitly says "no tmux", while the desired manager is a long-lived host-side Claude process running under a stable script wrapper, monitoring work continuously and handling most routine task management automatically.

Without these capabilities, the manager has to rediscover work state by hand, cannot safely gate ready tasks on prerequisites, cannot tell the difference between healthy work and stalled work without manual inspection, and cannot produce lightweight operational reporting such as "how long has this task been active?" or "which ready task has been waiting too long without pickup?" The result is unnecessary manual coordination, poor reliability, and weak feedback loops for improving the task system itself. The tasks UI also does not know how to display these richer task states and metadata, so any schema change needs a corresponding UI update rather than stopping at CLI/frontmatter support.

Context

Current task schema and validation live in tasks/tools/plans_cli.py.
tasks UI/rendering also needs to be updated to display new task fields and statuses. Relevant files likely include:
tasks/macros.py
tasks/stylesheets/tasks.css
tasks/javascripts/tasks.js
task/workstream pages that roll up task status
Existing task files use YAML frontmatter and are created/updated through the plans CLI; direct edits must preserve the scaffold and pass just task validate.
Current statuses are hard-coded in plans_cli.py as not_started, in_progress, completed, blocked, cancelled, and planned.
Completion metadata already exists as completed_at and completed_by, stamped when status becomes completed.
Current dispatch flow is host-side and already writes per-task log/status artifacts:
scripts/dispatch
scripts/dispatch-watch
tasks/logs/dispatch-<task-slug>.log
tasks/logs/dispatch-<task-slug>.status.json
Current task-manager repo skill at .codex/skills/task-manager/SKILL.md explicitly says "No containers, no tmux", so this work intentionally changes the documented operating model.
Relevant prior CBox manager artifacts exist and should be mined for useful ideas, especially logging and issue capture:
tasks/logs/cbox-task-status-history.md
tasks/logs/cbox-execution-issues.md
libs/cbox/skills/master-plan-cbox-manager/SKILL.md
The desired manager should stay host-side for now. No container dependency is required for the first version.
The manager should remain an orchestrator. Task implementation should still happen in isolated worktrees/branches, preserving 1 task = 1 worktree = 1 branch = 1 PR.
Success condition for this task: once a task is marked ready, the manager can pick it up and carry most of the flow automatically, with human intervention reserved for blockers, policy decisions, or abnormal recovery.

Detailed Requirements

Task metadata

Add a ready task status that means "eligible for pickup by the manager".
In v1, task pickup routing is based on owner.
A manager should be configured with an owner-routing rule and only consider ready tasks that match that rule.
Task owner remains task frontmatter. If a task is missing owner, fall back to the milestone/workstream default owner behavior already used by the task CLI.
For the initial manager deployment, treat role-based owners other than rj as Dave-routable. In practice, Dave's manager should watch all eligible ready tasks except tasks explicitly owned by rj or another RJ-specific identity.
Do not implement a distributed claim/lock protocol in v1. If duplicate pickup across managers becomes a real problem later, add explicit claiming as a follow-on hardening step.
Add dependencies to task frontmatter as a YAML list.
Dependencies should use task slugs as the canonical reference format in v1. The CLI may accept convenient input forms, but stored frontmatter should be normalized to a YAML list of task slugs.
A ready task with incomplete dependencies remains visible as ready in the UI, but the manager must not start it. It should be reported as dependency-blocked.
Add execution timing fields:
started_at: set when execution actually begins on the task
pr_created_at: set when the executor creates the PR
keep existing completed_at semantics for full completion/closure

Manager register

Keep the register minimal and canonical only for manager-local assignment facts.
The register should live under tasks/logs/ or another clearly manager-owned path in the repo, not in task frontmatter.
The register should contain one entry per active or tracked task with:
task identifier
task file path
worktree path
registration timestamp
launch timestamp if different from registration
Do not store volatile derived state such as "last known task status", "stalled", or "time since update" as canonical register fields unless heartbeat is the explicit single writer for a generated snapshot file.

Heartbeat sources of truth

Each heartbeat should derive state from canonical sources in this order:

task file frontmatter/body
manager register
dispatch status/log files when present
optional deeper worktree/git checks only when needed

The first version should avoid depending on expensive or noisy probes unless they materially improve decisions. The candidate task set should first be filtered by matching owner before pickup logic is applied.

Heartbeat outputs

Heartbeat should produce two kinds of output:

Durable local artifacts every run - append-only history log - latest generated snapshot file
Manager-facing textual summary for the long-lived manager agent

The local artifacts should be written every heartbeat. The manager-facing summary should be structured enough to support decisions without having to inspect files manually.

Manager-facing heartbeat message

Do not optimize for "minimal output" at the expense of utility. The heartbeat message should always contain enough information for the manager to decide whether to do nothing, pick up a task, recover a task, or escalate.

The default manager-facing heartbeat message should include these sections when non-empty:

Ready to pick up
tasks in ready with dependencies satisfied and not yet registered
Waiting on dependencies
tasks in ready that are blocked by incomplete dependencies
Active tasks
registered tasks currently in flight with concise state
Needs attention
stalled, failed, interrupted, inconsistent, or ambiguous tasks
Recently changed
tasks whose state changed since the last heartbeat

For each task listed in the heartbeat, include at least:

task identifier or short name
task file path or slug
current task status
dependency state if relevant
whether it is registered
worktree path if assigned
PR state if known
time since last task-file update
a one-line recommended manager action when attention is required

Heartbeat emission logic

Heartbeat local files should be updated every cycle.
Manager-facing output should be sent every cycle to the manager process, but the content should be decision-oriented and stable:
if nothing changed and nothing needs attention, emit a short "no action needed" summary rather than a long repeated dump
if something changed, include the changed tasks plus the current actionable queues
if something needs attention, surface that section first
Do not suppress the heartbeat entirely just because nothing changed; the manager should still receive a lightweight proof-of-life summary.
Also record a machine-readable snapshot with enough information to detect whether the manager itself has stopped heartbeating.
Add an explicit operator/manager control to pause or stop the heartbeat loop when the system is intentionally blocked or needs manual intervention, so tokens are not wasted during known-bad states.

Stall detection

Task-level stall detection should distinguish at least:

not picked up: task is ready, dependencies are satisfied, but it has not been registered within the pickup threshold
idle: registered task has had no meaningful progress signal within the idle threshold
failed/interrupted: dispatch status/log indicates failure or interruption
dependency blocked: task is ready but should not start yet

Default v1 thresholds:

pickup overdue: 10 minutes
idle task: 20 minutes
manager stale: no heartbeat snapshot update for 2 heartbeat intervals (default 6 minutes if heartbeat interval is 3 minutes)

Manager-level stall/backlog detection should distinguish at least:

manager heartbeat missing or stale
ready tasks accumulating without registration
tasks remaining in ambiguous state for too long

Manager stale detection should be based on the generated heartbeat snapshot/log timestamp, not on whether messages are visibly arriving in the chat. The system should be able to detect staleness from local artifacts alone.

When manager staleness is detected, v1 should support automatic recovery through the manager wrapper script:

detect stale/missing heartbeat
verify whether the underlying manager process/tmux session is absent or unhealthy
restart or re-ensure the manager when safe
log that recovery occurred

If automatic recovery cannot determine a safe action, escalate clearly instead of looping.

Deeper status command

Add a separate deeper status command so the manager can inspect a task on demand without bloating every heartbeat. Example inputs:

task path
task slug
worktree path

The deeper command can include:

git clean/dirty summary
dispatch-watch classification
time since latest file change in worktree
last few relevant log lines
PR metadata if available

UI behavior

The tasks UI should expose the new task fields clearly enough that an operator can scan the board without opening raw task files.

At minimum the UI should surface:

ready as a distinct visible state
dependency-blocked indication
started_at
pr_created_at
any useful rollup/filtering that helps identify:
ready tasks
blocked-on-dependency tasks
active tasks
tasks that have reached PR stage

Possible Solutions

Recommended: extend the existing plans/dispatch model with richer task metadata, a minimal manager register, UI support, and a tmux launcher hidden behind scripts. - Add a new pickup status such as ready. - Add dependency metadata and normalized lifecycle timestamps in task frontmatter. - Keep using host-side worktrees plus scripts/dispatch/dispatch-watch as the worker execution substrate. - Add a tmux-hosted manager launcher plus a heartbeat script that consumes task/worktree state and maintains a minimal manager register/log. - Add explicit stall-detection and escalation rules for both task execution and the manager itself. - Keep the register authoritative only for manager-local assignment facts such as task path, worktree path, and launch bookkeeping. Derivable status should be recomputed by heartbeat or cached as an explicit snapshot if needed. - Add a targeted status-inspection command for deeper per-task debugging instead of overloading the default heartbeat with every possible diagnostic. - Make the heartbeat change-aware so it emits compact deltas or periodic summaries instead of wasting tokens on "nothing changed" reports every cycle. - Pros: reuses current repo patterns, preserves worktree/PR isolation, minimizes duplicate orchestration code, keeps tmux behind a stable script interface, and gives a straightforward migration path from today's dispatch model. - Cons: requires updating several docs/tests and carefully separating canonical task truth from manager-local runtime state while still providing reliable escalation behavior.
Build a new manager runtime with its own independent worker/session model and treat the current dispatch scripts as legacy. - Pros: clean-slate design, potentially simpler conceptual model. - Cons: duplicates behavior that already exists in dispatch scripts, increases migration risk, and makes observability/tooling drift more likely.
Keep the existing status model and layer all extra state into a standalone manager register without changing task frontmatter. - Pros: smaller schema change. - Cons: task truth becomes split across task files and manager logs, dependency state is harder to inspect from the task itself, and basic task list/show commands remain underpowered.

The recommended path is to make task files richer and keep them as the primary source of truth for queueability and lifecycle state, while the manager register captures runtime assignment and feedback that is inherently manager-local.

Plan

Extend tasks/tools/plans_cli.py and related docs/tests to support: - a ready status - dependency metadata - lifecycle timestamps for work start and PR handoff - list/show/validate behavior for the new fields
Decide and document the canonical dependency reference format, then implement normalization and validation for it in the task CLI.
Update tasks UI rollups and task rendering so the new fields are visible and useful in the planning surface, not just in raw frontmatter.
Decide final field names for execution timing. Preferred default: - started_at for first active execution - pr_created_at for PR handoff time - keep completed_at for fully completed tasks
Normalize dependencies as YAML list data in frontmatter, even if CLI input accepts comma-separated values for convenience.
Add a tmux-based manager launch path that starts Claude in dangerous mode as a long-lived host-side orchestrator, but expose it through a stable script such as scripts/task-manager-start or scripts/task-manager-ensure so operators and docs do not need to speak in raw tmux commands.
Add a minimal manager register file that tracks canonical manager-local assignment data only: - task file path - worktree path - manager launch / registration timestamps - optional task title as a convenience copy if it materially helps readability
Implement a heartbeat script with a default 3-minute interval that: - discovers ready tasks owned by the configured manager owner-routing rule and not yet registered - blocks pickup when dependencies are incomplete - inspects registered task/worktree state - writes an append-only history log every run - writes a generated latest-snapshot file every run - emits a manager-facing textual summary every run with explicit sections for:
- ready to pick up
- waiting on dependencies
- active tasks
- needs attention
- recently changed
- keeps no-op heartbeats short but still present as proof-of-life
- auto-dispatches immediately when a task is eligible for pickup; no extra waiting/review queue in v1
Implement explicit stall classification and escalation rules for: - ready but unregistered tasks past the 10-minute pickup threshold - registered but idle tasks past the 20-minute idle threshold - failed/interrupted tasks - manager missed-heartbeat / backlog conditions using the heartbeat snapshot/log timestamps
Add manager controls for pause/stop/restart behavior: - explicit way to pause the heartbeat loop - explicit way to stop the manager cleanly - ensure or restart path for stale-manager recovery
Ensure manager configuration includes an explicit owner identity/routing rule and that docs/scripts make this routing model clear. - Dave default routing should include role-owned tasks except RJ-owned tasks - RJ manager routing should include RJ-owned tasks only unless configured otherwise
Add a targeted status-inspection script/command for on-demand deeper diagnostics, for example by task path or worktree path. Candidate extra details: - git dirty/clean summary - time since worktree file changes - dispatch/worker status when present These are valuable, but they do not have to be part of the default heartbeat if they add latency or noise.
Decide and codify PR ownership policy for the new loop. Current leaning: - executor creates the PR inside the task worktree - manager supervises, records feedback, and decides what needs intervention
Update .codex/skills/task-manager/SKILL.md, .claude/commands/task-dispatch.md, and any related operator docs so the new manager flow is the documented default without requiring every surface to know about tmux internals.
Add focused tests for schema validation, dependency normalization by slug, owner-based pickup routing, timestamp transitions, heartbeat behavior, auto-dispatch, manager-launch command assembly, stale-manager recovery, pause/stop controls, stall detection/escalation, manager-output formatting, and UI rendering/rollup behavior for the new task fields.

Implementation Progress

2026-03-18: Task created from the plans CLI after working around a current current_milestone() bug in plans_cli.py. yaml.safe_load(tasks/mkdocs.yml) currently fails on a Python-tagged MkDocs config value, so task creation required an explicit --milestone m1-ft-analytics-analyst-pilot.
2026-03-18: Initial design review completed against:
2026-03-18: Scope updated to explicitly include tasks UI support for the new task metadata/statuses.
2026-03-18: Register-file design adjusted toward a minimal canonical assignment record plus heartbeat-derived snapshots/logging, to avoid duplicating stale status data in multiple places.
2026-03-18: Manager ergonomics direction updated to prefer wrapper scripts such as task-manager-start / task-manager-ensure so the rest of the system does not need to care whether the manager is implemented via tmux internally.
2026-03-18: Deeper git/worktree diagnostics moved from default-heartbeat requirements to an on-demand status inspection command unless they prove cheap enough to include by default.
2026-03-18: Problem framing broadened to cover the actual product goal: reduce manual task-management overhead so marking a task ready is enough to get it picked up, monitored, and carried toward PR with minimal operator babysitting.
2026-03-18: Robustness requirements added for task-level stall detection, manager-level missed-heartbeat detection, and low-noise heartbeat reporting that avoids wasting tokens when nothing meaningful changed.
2026-03-18: Task spec made more explicit around heartbeat behavior, including sources of truth, durable artifacts, manager-facing output sections, per-task fields to report, and proof-of-life behavior for no-op heartbeats.
2026-03-18: Shared distributed claiming was intentionally deferred. V1 pickup routing should use owner matching only; duplicate-pickup prevention via pushed claims can be added later if it becomes necessary.
2026-03-18: Concrete v1 decisions captured:
dependency format = task slug
routing = owner-based, with Dave handling non-RJ role-owned tasks
auto-dispatch immediately when eligible
idle threshold = 20 minutes
manager stale threshold = 2 heartbeat intervals
manager wrapper should support pause/stop/restart semantics
2026-03-18: Implemented task schema and CLI changes in tasks/tools/plans_cli.py:
added ready status
normalized dependencies to YAML list of task slugs
stamped started_at on in_progress
added PR metadata support (pr_created_at, pr_number, pr_url)
fixed current_milestone() fallback so tagged MkDocs YAML no longer breaks task creation
2026-03-18: Updated tasks UI rollups in tasks/macros.py and tasks/stylesheets/tasks.css:
distinct ready state rendering
dependency-blocked badge
started / PR metadata badges in milestone task rollups
2026-03-18: Added manager scripts under scripts/:
task-manager-heartbeat
task-manager-check-status
task-manager-run
task-manager-start
task-manager-ensure
task-manager-pause
task-manager-stop
shared task_manager_lib.py
2026-03-18: Updated task-manager docs/commands and tasks/tools/cli-reference.md to reflect owner-routed ready pickup, tmux wrapper scripts, and executor-owned PR creation with PR metadata stamping.
2026-03-18: Validation completed:
uv run pytest tests/core/test_plans_cli.py tests/core/test_tasks_macros.py tests/scripts/test_task_manager_scripts.py
just task validate tasks/workstreams/infra-tooling/tasks/build-tmux-task-manager-orchestration-loop-and-task-metadata.md
2026-03-18: Direct script sanity checks completed:
scripts/task-manager-heartbeat --owner dave --format text
scripts/task-manager-start --owner dave --dry-run
scripts/task-manager-check-status build-tmux-task-manager-orchestration-loop-and-task-metadata --owner dave
2026-03-18: Initial design review completed against:
tasks/tools/plans_cli.py
scripts/dispatch
scripts/dispatch-watch
.codex/skills/task-manager/SKILL.md
.claude/commands/task-dispatch.md
prior CBox manager notes/logs
2026-03-18: PR ownership decision implemented in docs/tooling direction: executor creates the PR from the task worktree, and scripts/pr-create stamps PR metadata back into the task file.

QA Exploration

N/A. This is orchestration/CLI/task-schema work, not a browser-facing feature.

[x] QA exploration completed (or N/A for non-UI tasks)

Review Feedback

[ ] Review cleared
Review attempt note: scripts/review was invoked from this branch but did not produce a usable result during this implementation session, so review is not being claimed as complete here.