Dataface Tasks

Add task-manager reconciliation and cleanup passes for stale register and metadata drift

IDINFRA_TOOLING-ADD_TASK_MANAGER_RECONCILIATION_AND_CLEANUP_PASSES_FOR_STALE_REGISTER_AND_METADATA_DRIFT
Statuscompleted
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownersr-engineer-architect
Completed bydavefowler
Completed2026-03-24

Problem

Add periodic reconciliation so the task manager checks for stale register entries, completed-but-unreconciled worktrees, PR-created tasks whose root task file is stale, and post-pull changes from main. Clean up or flag orphaned worktrees/register entries, emit explicit metadata_drift signals, and add multi-layer completion detection instead of relying on one real-time event path.

Context

The task manager heartbeat (scripts/task-manager-heartbeat) collects tasks, classifies them into buckets (ready, active, waiting, needs_attention), and emits a snapshot. The core logic lives in scripts/task_manager_lib.py with tests in tests/scripts/test_task_manager_scripts.py.

The 2026-03-24 friction log (tasks/logs/task-manager-friction-log-2026-03-24.md) documents five concrete incidents:

  1. Register orphan drift — completed tasks retained registered=yes in the register long after completion/merge, inflating active counts and making old worktrees look live.
  2. Root/worktree metadata drift — worker updates the task file in its worktree (status, PR metadata) but the root task file on main is never reconciled, causing false pickup_overdue and queue drift.
  3. Dispatch completed but unreconciled — worker exits 0 but task remains ready/in_progress because no completion handshake exists.

The register lives at .tasks/task_manager/task-manager-{owner}.register.json with entries keyed by slug. Each entry records slug, task_path, worktree_path, registered_at, launched_at, started_at. The classify_tasks function already handles dispatch state signals (failed, stalled, worker_gone) and idle detection, but has no reconciliation or cleanup pass.

Key constraint: heartbeat must never mutate task frontmatter. Reconciliation signals are advisory (added to needs_attention / escalation_reasons), and register pruning is the only write side-effect.

Possible Solutions

A. Inline reconciliation in classify_tasks

Add reconciliation checks directly into the existing classification loop. Simple but bloats an already complex function.

Add three focused functions: - reconcile_register_orphans(owner, tasks) — prune register entries for completed+merged tasks or missing worktrees - detect_metadata_drift(tasks) — compare root task mtime vs worktree task mtime, flag drift - detect_dispatch_completed_unreconciled(tasks) — flag tasks where dispatch exited 0 but status is not completed

These run as a post-classification pass in heartbeat, keeping classify_tasks unchanged. Each function is independently testable and can be toggled via env vars if needed.

C. Background reconciliation daemon

Overkill for current scale. The heartbeat loop already runs every 3 minutes.

Plan

Files to modify: - scripts/task_manager_lib.py — add reconciliation functions + new signal codes - tests/scripts/test_task_manager_scripts.py — TDD tests for each reconciliation pass - scripts/task-manager-heartbeat — call reconciliation after classification

Implementation steps: 1. Write failing tests for reconcile_register_orphans (prune completed tasks with missing worktrees) 2. Write failing tests for detect_metadata_drift (root older than worktree → signal) 3. Write failing tests for detect_dispatch_completed_unreconciled (exit 0 + non-completed → signal) 4. Implement reconcile_register_orphans in lib 5. Implement detect_metadata_drift in lib 6. Implement detect_dispatch_completed_unreconciled in lib 7. Wire into heartbeat script 8. Run just ci to validate

Implementation Progress

Functions added to scripts/task_manager_lib.py

  1. reconcile_register_orphans(owner, tasks) — Prunes register entries where: (a) the task file no longer exists and the worktree is gone, or (b) the task status is completed and the worktree is gone. Returns list of pruned slugs. Writes the cleaned register back to disk.

  2. detect_metadata_drift(tasks) — For each registered task with a worktree, compares root task file mtime vs worktree task file mtime. If the worktree file is newer, adds metadata_drift escalation signal to the task. Returns list of drifted slugs.

  3. detect_dispatch_completed_unreconciled(tasks) — For non-completed tasks whose dispatch state is exited (exit code 0), adds dispatch_completed_unreconciled escalation signal. Returns list of flagged slugs.

Integration in scripts/task-manager-heartbeat

All three reconciliation functions are called after apply_pr_ci_attention in the heartbeat main loop, so signals appear in both snapshot JSON and text summary output.

Tests added to tests/scripts/test_task_manager_scripts.py

  • test_reconcile_register_orphans_prunes_completed_with_missing_worktree
  • test_reconcile_register_orphans_keeps_active_task
  • test_reconcile_register_orphans_prunes_entry_with_missing_worktree_and_no_task
  • test_detect_metadata_drift_flags_worktree_newer_than_root
  • test_detect_metadata_drift_no_signal_when_root_is_current
  • test_detect_dispatch_completed_unreconciled_flags_exit_0_non_completed
  • test_detect_dispatch_completed_unreconciled_ignores_completed_task
  • test_detect_dispatch_completed_unreconciled_ignores_failed_dispatch

QA Exploration

  • [x] QA exploration completed (or N/A for non-UI tasks)
  • N/A: This is a backend/infra task with no UI changes. All behavior is covered by unit tests.

Review Feedback

  • [ ] Review cleared