Add lightweight qa-explorer verification artifacts and trace capture
Problem
Upgrade qa-explorer local visual verification with lightweight, ephemeral, gitignored per-run artifacts instead of repo-committed evidence. Add a stable artifact bundle contract, structured markdown/json QA summaries, console and network error summaries, and optional Playwright trace/session capture when it stays lightweight. Do not auto-link evidence into PRs or task files beyond normal human-written summaries, and skip before/after visual comparison work for now.
Context
The qa-explorer skill (scripts/qa-explore) already creates per-run artifact
directories under .qa-explorer/runs/<run_id>/ with browser profiles, Playwright
output, and trace/session capture enabled. However there was no stable contract
for what a run produces, no structured summary output, and no explicit diagnostics
capture for console errors or failed network requests.
Key files:
- .codex/skills/qa-explorer/SKILL.md — skill documentation and worker instructions
- scripts/qa-explore — bash wrapper that launches the claude worker
- tests/scripts/test_dispatch_scripts.py — integration tests for the script
Possible Solutions
Recommended: Prompt + SKILL.md contract approach. Define the artifact bundle
contract in SKILL.md (which the worker reads), and add explicit instructions in the
prompt to write summary.md and summary.json. This is lightweight — no new
dependencies, no new scripts, just documentation and a prompt update.
Alternative: Post-processing script. Add a script that parses claude worker output and generates summaries. Rejected: over-engineered for the current need, and the worker already has full context to write the summaries itself.
Plan
- Add "Artifact Bundle Contract" section to SKILL.md defining the per-run layout,
summary.mdformat,summary.jsonschema, diagnostics capture requirements, and trace/session documentation. - Update
scripts/qa-exploreprompt to instruct the worker to write both summary files and capture console/network diagnostics. - TDD: write failing test first, then implement.
Implementation Progress
Changes made
-
.codex/skills/qa-explorer/SKILL.md— Added "Artifact Bundle Contract" section with: - Per-run directory layout diagram -summary.md— human-readable report persisted to disk -summary.json— machine-readable schema with findings, diagnostics, and suggested tests - Diagnostics capture instructions (console errors, failed network requests) - Playwright trace/session documentation -
scripts/qa-explore— Extended the prompt injected into the claude worker with explicit instructions to writesummary.mdandsummary.json, and to capture console errors and failed network requests as diagnostics. -
tests/scripts/test_dispatch_scripts.py— Addedtest_qa_explore_prompt_instructs_structured_artifact_outputthat verifies the prompt includes references to summary.md, summary.json, and console error capture.
What was intentionally excluded (per task scope)
- No auto-linking of evidence into PR bodies or task files
- No before/after visual comparison or visual diff baselines
- Artifacts remain ephemeral and gitignored under
.qa-explorer/
QA Exploration
N/A — this is infrastructure/skill tooling, not a UI change.
- [x] QA exploration completed (or N/A for non-UI tasks)
Review Feedback
just reviewverdict: APPROVED. Review called out only a minor style note about long assertion lines in the new test, with no material correctness or security issues.-
scripts/pr-validate prepassed after a clean rebase ontoorigin/mainand a full local CI run. -
[x] Review cleared