Fix qa-explore delegated run stall before summary handoff

ID	INFRA_TOOLING-FIX_QA_EXPLORE_DELEGATED_RUN_STALL_BEFORE_SUMMARY_HANDOFF
Status	completed
Priority	p1
Milestone	m1-ft-analytics-analyst-pilot
Owner	sr-engineer-architect
Completed by	dave
Completed	2026-03-22

Problem

Reproduce why scripts/qa-explore launches the inner delegated browser worker and captures artifacts but often stalls before writing summary files or returning a final result, then fix the wrapper or contract so delegated QA completes reliably.

Context

scripts/qa-explore already launches a delegated local claude -p browser worker with an isolated Playwright MCP config and dangerous mode by default.
In real runs on March 18, 2026, the delegated worker consistently launched Chrome, browsed the app, and produced screenshots, traces, console logs, and session.md, but it often failed to ever write summary.md / summary.json or exit.
A previous accidental scripts/qa-explore --help invocation exposed an additional bug: --help was parsed as the focus argument and launched a real delegated QA run instead of showing usage.
The main operational problem is not browser startup. The worker can stay busy inside Playwright for a long time with no wrapper-level stop condition, leaving the parent shell hanging indefinitely and orphaning Claude / MCP / Chrome processes when interrupted.

Possible Solutions

Add wrapper-side runtime bounds, artifact validation, and child-process cleanup around the delegated claude -p worker.
Recommended: this does not depend on changing Claude internals, prevents indefinite hangs immediately, and gives operators deterministic failure behavior plus preserved artifacts.
Tighten the delegated prompt with explicit stop conditions once enough evidence has been captured.
Useful, but insufficient alone because the worker may still ignore or delay the transition to report-writing.
Try to infer completion from Playwright session activity and kill on idle.
Possible future improvement, but more brittle than a simple hard runtime bound and not needed for the first safety fix.

Plan

Reproduce the stall on latest main in a fresh worktree and confirm whether the worker is blocked or simply over-exploring.
Patch scripts/qa-explore to: - parse --help correctly - support a bounded runtime - kill delegated Claude / MCP / Chrome children on timeout - fail if required summary artifacts are missing
Add focused script tests for help handling, timeout behavior, and artifact-contract enforcement.
Run focused script tests and a real delegated QA repro to confirm the new wrapper behavior.

Implementation Progress

Reproduced the issue on latest main (dcdc9d3e) in a fresh task worktree. The delegated worker launched successfully and produced browser traces, but continued exploring instead of writing summary.md / summary.json.
Confirmed this is primarily an under-constrained delegated-run problem, not a permissions problem:
dangerous mode is already the default on main
Playwright MCP launches successfully
browser artifacts are captured normally
Fixed scripts/qa-explore so --help no longer launches a real delegated run.
Added --max-seconds with a default runtime cap and wrapped claude -p in a Python watchdog that:
captures delegated output to claude-output.log
terminates the entire delegated process group on timeout
exits with code 124 on timeout
Added post-run artifact validation so a delegated worker that exits cleanly without summary.md / summary.json now fails fast instead of silently appearing successful.
Strengthened the delegated prompt with explicit stop conditions and an instruction to exit immediately once the artifact bundle is complete.
Added focused regression coverage in tests/scripts/test_dispatch_scripts.py for:
--help behavior
required summary artifact enforcement
timeout reporting / cleanup
summary recovery from saved browser evidence
deterministic local fallback summary generation when both Claude phases stall
Refined scripts/qa-explore into a three-step recovery model:
bounded browser evidence capture
bounded text-only Claude recovery pass
deterministic local summary synthesis from saved artifacts as the final fallback
Focused validation passed: uv run pytest tests/scripts/test_dispatch_scripts.py -q (21 passed).

QA Exploration

Used the delegated QA path directly via scripts/qa-explore during diagnosis.
Diagnosis runs confirmed:
the delegated worker browses successfully and captures Playwright artifacts
the problematic behavior is prolonged browser exploration without transitioning to summary-writing
Intermediate verification showed the wrapper timed out cleanly, preserved artifacts, and left no delegated Claude / Playwright processes running afterward.
Final verification run against http://127.0.0.1:8100 produced a complete recovered artifact bundle under .qa-explorer/runs/recovered-general-feedback-v2/:
the browser worker timed out after 60s
the text-only Claude recovery pass also timed out after 45s
the wrapper then wrote summary.md and summary.json locally from the saved evidence and exited successfully
the recovered summary captured the remaining favicon and dashboard-route 404s as structured findings
[x] QA exploration completed (or N/A for non-UI tasks)

Review Feedback

[ ] Review cleared