Dataface Tasks

Fix qa-explore delegated run stall before summary handoff

IDINFRA_TOOLING-FIX_QA_EXPLORE_DELEGATED_RUN_STALL_BEFORE_SUMMARY_HANDOFF
Statuscompleted
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownersr-engineer-architect
Completed bydave
Completed2026-03-22

Problem

Reproduce why scripts/qa-explore launches the inner delegated browser worker and captures artifacts but often stalls before writing summary files or returning a final result, then fix the wrapper or contract so delegated QA completes reliably.

Context

  • scripts/qa-explore already launches a delegated local claude -p browser worker with an isolated Playwright MCP config and dangerous mode by default.
  • In real runs on March 18, 2026, the delegated worker consistently launched Chrome, browsed the app, and produced screenshots, traces, console logs, and session.md, but it often failed to ever write summary.md / summary.json or exit.
  • A previous accidental scripts/qa-explore --help invocation exposed an additional bug: --help was parsed as the focus argument and launched a real delegated QA run instead of showing usage.
  • The main operational problem is not browser startup. The worker can stay busy inside Playwright for a long time with no wrapper-level stop condition, leaving the parent shell hanging indefinitely and orphaning Claude / MCP / Chrome processes when interrupted.

Possible Solutions

  • Add wrapper-side runtime bounds, artifact validation, and child-process cleanup around the delegated claude -p worker.
  • Recommended: this does not depend on changing Claude internals, prevents indefinite hangs immediately, and gives operators deterministic failure behavior plus preserved artifacts.
  • Tighten the delegated prompt with explicit stop conditions once enough evidence has been captured.
  • Useful, but insufficient alone because the worker may still ignore or delay the transition to report-writing.
  • Try to infer completion from Playwright session activity and kill on idle.
  • Possible future improvement, but more brittle than a simple hard runtime bound and not needed for the first safety fix.

Plan

  1. Reproduce the stall on latest main in a fresh worktree and confirm whether the worker is blocked or simply over-exploring.
  2. Patch scripts/qa-explore to: - parse --help correctly - support a bounded runtime - kill delegated Claude / MCP / Chrome children on timeout - fail if required summary artifacts are missing
  3. Add focused script tests for help handling, timeout behavior, and artifact-contract enforcement.
  4. Run focused script tests and a real delegated QA repro to confirm the new wrapper behavior.

Implementation Progress

  • Reproduced the issue on latest main (dcdc9d3e) in a fresh task worktree. The delegated worker launched successfully and produced browser traces, but continued exploring instead of writing summary.md / summary.json.
  • Confirmed this is primarily an under-constrained delegated-run problem, not a permissions problem:
  • dangerous mode is already the default on main
  • Playwright MCP launches successfully
  • browser artifacts are captured normally
  • Fixed scripts/qa-explore so --help no longer launches a real delegated run.
  • Added --max-seconds with a default runtime cap and wrapped claude -p in a Python watchdog that:
  • captures delegated output to claude-output.log
  • terminates the entire delegated process group on timeout
  • exits with code 124 on timeout
  • Added post-run artifact validation so a delegated worker that exits cleanly without summary.md / summary.json now fails fast instead of silently appearing successful.
  • Strengthened the delegated prompt with explicit stop conditions and an instruction to exit immediately once the artifact bundle is complete.
  • Added focused regression coverage in tests/scripts/test_dispatch_scripts.py for:
  • --help behavior
  • required summary artifact enforcement
  • timeout reporting / cleanup
  • summary recovery from saved browser evidence
  • deterministic local fallback summary generation when both Claude phases stall
  • Refined scripts/qa-explore into a three-step recovery model:
  • bounded browser evidence capture
  • bounded text-only Claude recovery pass
  • deterministic local summary synthesis from saved artifacts as the final fallback
  • Focused validation passed: uv run pytest tests/scripts/test_dispatch_scripts.py -q (21 passed).

QA Exploration

  • Used the delegated QA path directly via scripts/qa-explore during diagnosis.
  • Diagnosis runs confirmed:
  • the delegated worker browses successfully and captures Playwright artifacts
  • the problematic behavior is prolonged browser exploration without transitioning to summary-writing
  • Intermediate verification showed the wrapper timed out cleanly, preserved artifacts, and left no delegated Claude / Playwright processes running afterward.
  • Final verification run against http://127.0.0.1:8100 produced a complete recovered artifact bundle under .qa-explorer/runs/recovered-general-feedback-v2/:
  • the browser worker timed out after 60s
  • the text-only Claude recovery pass also timed out after 45s
  • the wrapper then wrote summary.md and summary.json locally from the saved evidence and exited successfully
  • the recovered summary captured the remaining favicon and dashboard-route 404s as structured findings

  • [x] QA exploration completed (or N/A for non-UI tasks)

Review Feedback

  • [ ] Review cleared