CBox sandbox session liveness drop detection and recovery
Problem
CBox sandbox sessions can silently disappear ("No session found") while the underlying worktree and branch remain intact. When this happens, cbox send and cbox output return a generic error with no diagnostic information — the operator cannot determine whether the container exited, the tmux session was killed, or the entire sandbox was cleaned up. Without structured cause reporting or recovery guidance, the manager agent resorts to manual inspection (attaching to tmux, checking Docker state, listing worktrees) to figure out what happened and how to resume. This lack of session-drop diagnostics is a major gap in cbox observability for long-running sandbox workflows.
Context
Possible Solutions
Plan
Implementation Progress
Implementation summary
Problem
When a sandbox session disappeared ("No session found"), cbox only printed a generic error with no diagnostics. Operators had to manually attach/inspect to understand why the session dropped and how to recover.
Solution
Added SessionDropDiagnostics dataclass and _diagnose_session_drop() to
collect structured diagnostics when a session is expected but missing:
- Worktree state: checks if
.worktrees/<name>/still exists - Port allocation: checks registry for stale port entries
- Container state: inspects Docker container running/exited status
- Cause classification:
container_exited,session_killed,fully_cleaned - Recovery guidance: context-aware hints (resume vs fresh start vs restart)
- Runtime logging:
session_dropevents logged to.cbox/logs/runtime.log
Files changed
libs/cbox/cbox/health.py— addedSessionDropDiagnosticsdataclasslibs/cbox/cbox/cli.py— added_diagnose_session_drop(), enhancedsend,outputcommands with diagnostic panelslibs/cbox/test_session_drop_diagnostics.py— 18 tests covering dataclass, diagnostics function, CLI integration, and logging
User-visible behavior
cbox send <name> and cbox output <name> now show a "Session Drop
Diagnostics" panel when the session is missing, including cause, worktree
status, container state, and recovery commands.
- No cross-workstream dependencies; builds on existing diagnostics infra
from
cbox-sandbox-bootstrap-and-auth-health-checksandcbox-session-registry-stale-after-sandbox-kill. - Follow-up: consider periodic liveness probes for long-running sandboxes (not needed for M1 — on-demand diagnostics sufficient).
Review Feedback
- [ ] Review cleared