cbox review has two behaviors that were conflated:
The system currently has heartbeats and timeout guards, but in sandbox-driven flows the operator signal is still ambiguous enough that "silent" can be mistaken for "hung".
Primary paths:
- Inside container/sandbox: cbox review ... -> _run_review_in_tmux(...)
- Host: cbox review ... -> ephemeral container + _run_subprocess_with_heartbeat(...)
Current guardrails already present:
- Default review timeout: 1200s (DEFAULT_REVIEW_TIMEOUT)
- Heartbeats every 15s (REVIEW_PROGRESS_HEARTBEAT_SECONDS)
- Startup timeout waiting for Claude prompt in tmux: 60s
- Stall warnings if output file size does not change while waiting
- cbox output --check-stall for interactive prompt blockers
Not hang:
- Review duration <= configured timeout (--review-timeout, default 1200s)
- Any of these progress signals observed:
- heartbeat lines
- tmux pane activity
- .cbox/reviews/*.md created or growing
Suspected hang: - No pane activity and no review artifact growth for >= 120s
Confirmed hang: - Exceeds configured timeout without verdict OR - session dead/invalid with no deterministic error surfaced
So no, "> 10 minutes" alone is not a hang. 10-15 minutes can be normal.
Visibility gap in delegated/sandbox orchestration
- Review may be running, but the caller sees sparse output and assumes stall.
- cbox output snapshots can miss in-between progress.
Review output contract is file-based (## Verdict sentinel)
- In-container path waits for review file + verdict text.
- If model doesn't reach/write final verdict section, flow waits until timeout.
Interactive prompt blockers in tmux - Workspace trust/effort prompts are auto-dismissed; other prompts may still block and need intervention. - These can look like "hung" unless stall check is run.
Environment drift in sandbox worktrees - bootstrap warnings / dependency issues can degrade command reliability and confuse diagnosis.
For sandbox <name>:
just cbox output --check-stall <name>
tmux capture-pane -pt cbox-sandbox-<name> | tail -n 60
ls -lt .worktrees/<name>/.cbox/reviews
If active progress exists: wait until timeout budget.
If no progress for >=120s:
uv run cbox send --interrupt <name> "retry: run cbox review --watch changes and report REVIEW_EXIT"
If retry still stalls: stop session and recreate sandbox from clean state.
Escalate only after: - one clean retry attempt failed - timeout exceeded OR repeated no-progress window
I treated sparse/no stdout as hang too early in several runs, instead of strictly applying timeout + freshness criteria. That caused unnecessary manager takeover and noisy diagnosis.
.cbox/reviews/.state-<run_id>.jsonstarting, prompt_ready, prompt_sent, artifact_seen, verdict_seen, timeout, errorSuccess criteria: - Operator can distinguish active vs stuck from one file without tmux attach.
Success criteria: - Fewer manual interruptions/restarts for transient stalls.
RUNNING, SUSPECT_STALL, TIMEOUT, FAILED, COMPLETE.cbox output show latest status class and elapsed/timeout ratio.Axes:
- Execution path: in-container tmux vs host ephemeral container
- Output mode: --watch on/off
- Runner mode: interactive terminal vs delegated tool call
- Failure injection:
- no verdict ever
- artifact grows then freezes
- interactive prompt blocker
- session drop
Pass bar: - 0 ambiguous "silent" outcomes without status classification - deterministic terminal state within timeout budget - auto-retry succeeds or exits with actionable diagnostics
cbox output --check-stall + artifact freshness check before interrupt/restart.