Dataface Tasks

CBox sandbox startup-timeout diagnostics

IDINFRA_TOOLING-CBOX_SANDBOX_STARTUP_TIMEOUT_DIAGNOSTICS
Statusdone
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownerhead-of-engineering

Problem

When wait_for_prompt times out during sandbox or review startup, cbox prints only "Timeout waiting for Claude to start" with no additional context. The operator has no visibility into whether the container failed to launch, Claude crashed during initialization, or the prompt simply took longer than expected. Diagnosing the failure requires manually attaching to the tmux session and inspecting Docker container state — a multi-step process that is impractical for automated manager workflows. This opaque timeout message is the most common first symptom of sandbox startup failures and provides zero signal for automated recovery.

Context

Possible Solutions

Plan

Implementation Progress

  • StartupTimeoutDiagnostics dataclass in cbox/health.py with pane_tail, container_running, and summary() formatter.
  • _collect_startup_timeout_diagnostics() function in cbox/cli.py that captures tmux pane content and container running state via docker inspect.
  • All three wait_for_prompt timeout sites (cbox start, _run_review_in_session, cbox new) display a "Startup Timeout Diagnostics" panel on failure.
  • 9 deterministic unit/integration tests covering collector logic, summary formatting, and CLI integration.

  • On startup timeout, operators see last pane output and container state without manually attaching.

  • Diagnostics degrade gracefully (docker inspect timeout -> "unknown", no container -> "unknown").
  • Existing cbox tests continue to pass.

Review Feedback

  • [ ] Review cleared