Fix qa-explore delegated run stall before summary handoff
Problem
Reproduce why scripts/qa-explore launches the inner delegated browser worker and captures artifacts but often stalls before writing summary files or returning a final result, then fix the wrapper or contract so delegated QA completes reliably.
Context
scripts/qa-explorealready launches a delegated localclaude -pbrowser worker with an isolated Playwright MCP config and dangerous mode by default.- In real runs on March 18, 2026, the delegated worker consistently launched Chrome, browsed the app, and produced screenshots, traces, console logs, and
session.md, but it often failed to ever writesummary.md/summary.jsonor exit. - A previous accidental
scripts/qa-explore --helpinvocation exposed an additional bug:--helpwas parsed as thefocusargument and launched a real delegated QA run instead of showing usage. - The main operational problem is not browser startup. The worker can stay busy inside Playwright for a long time with no wrapper-level stop condition, leaving the parent shell hanging indefinitely and orphaning Claude / MCP / Chrome processes when interrupted.
Possible Solutions
- Add wrapper-side runtime bounds, artifact validation, and child-process cleanup around the delegated
claude -pworker. - Recommended: this does not depend on changing Claude internals, prevents indefinite hangs immediately, and gives operators deterministic failure behavior plus preserved artifacts.
- Tighten the delegated prompt with explicit stop conditions once enough evidence has been captured.
- Useful, but insufficient alone because the worker may still ignore or delay the transition to report-writing.
- Try to infer completion from Playwright session activity and kill on idle.
- Possible future improvement, but more brittle than a simple hard runtime bound and not needed for the first safety fix.
Plan
- Reproduce the stall on latest
mainin a fresh worktree and confirm whether the worker is blocked or simply over-exploring. - Patch
scripts/qa-exploreto: - parse--helpcorrectly - support a bounded runtime - kill delegated Claude / MCP / Chrome children on timeout - fail if required summary artifacts are missing - Add focused script tests for help handling, timeout behavior, and artifact-contract enforcement.
- Run focused script tests and a real delegated QA repro to confirm the new wrapper behavior.
Implementation Progress
- Reproduced the issue on latest
main(dcdc9d3e) in a fresh task worktree. The delegated worker launched successfully and produced browser traces, but continued exploring instead of writingsummary.md/summary.json. - Confirmed this is primarily an under-constrained delegated-run problem, not a permissions problem:
- dangerous mode is already the default on
main - Playwright MCP launches successfully
- browser artifacts are captured normally
- Fixed
scripts/qa-exploreso--helpno longer launches a real delegated run. - Added
--max-secondswith a default runtime cap and wrappedclaude -pin a Python watchdog that: - captures delegated output to
claude-output.log - terminates the entire delegated process group on timeout
- exits with code
124on timeout - Added post-run artifact validation so a delegated worker that exits cleanly without
summary.md/summary.jsonnow fails fast instead of silently appearing successful. - Strengthened the delegated prompt with explicit stop conditions and an instruction to exit immediately once the artifact bundle is complete.
- Added focused regression coverage in
tests/scripts/test_dispatch_scripts.pyfor: --helpbehavior- required summary artifact enforcement
- timeout reporting / cleanup
- summary recovery from saved browser evidence
- deterministic local fallback summary generation when both Claude phases stall
- Refined
scripts/qa-exploreinto a three-step recovery model: - bounded browser evidence capture
- bounded text-only Claude recovery pass
- deterministic local summary synthesis from saved artifacts as the final fallback
- Focused validation passed:
uv run pytest tests/scripts/test_dispatch_scripts.py -q(21 passed).
QA Exploration
- Used the delegated QA path directly via
scripts/qa-exploreduring diagnosis. - Diagnosis runs confirmed:
- the delegated worker browses successfully and captures Playwright artifacts
- the problematic behavior is prolonged browser exploration without transitioning to summary-writing
- Intermediate verification showed the wrapper timed out cleanly, preserved artifacts, and left no delegated Claude / Playwright processes running afterward.
- Final verification run against
http://127.0.0.1:8100produced a complete recovered artifact bundle under.qa-explorer/runs/recovered-general-feedback-v2/: - the browser worker timed out after
60s - the text-only Claude recovery pass also timed out after
45s - the wrapper then wrote
summary.mdandsummary.jsonlocally from the saved evidence and exited successfully -
the recovered summary captured the remaining favicon and dashboard-route 404s as structured findings
-
[x] QA exploration completed (or N/A for non-UI tasks)
Review Feedback
- [ ] Review cleared