cbox send false-positive delivery when sandbox TUI ignores input
Problem
Investigate and fix cases where cbox send reports success after tmux send-keys, but the target sandbox Claude TUI does not consume or act on the message. Add end-to-end delivery verification and diagnostics so cbox send fails loudly instead of claiming success when the pane remains idle.
Context
- Observed on Friday, March 13, 2026 while managing sandbox
worktree2. uv run cbox send worktree2 "<message>"reported success, but the target sandbox pane stayed at an idle Claude prompt with no visible message delivery or response.- Direct
tmux send-keysinto the same pane did inject text, so the failure mode is not a simple session-name mismatch. - Repeated sandbox sessions also showed unstable interactive behavior: attached tmux clients sometimes could not type, and the sandbox process later exited unexpectedly mid-task.
- Relevant implementation paths:
libs/cbox/cbox/cli.pysend()commandlibs/cbox/cbox/tmux.pysend_message()and prompt detection helperslibs/cbox/cbox/container.pyinteractive container launch path- Recent investigation already fixed one likely contributor: interactive containers now use
TERM=screen-256colorinstead of forwarding an incorrect hostTERM. - Remaining gap:
cbox sendtreats tmux key injection as success without verifying that Claude actually consumed the message or that the pane state changed.
Possible Solutions
A. End-to-end delivery verification in cbox send and tmux.send_message() — Recommended
Capture pane state before send, inject the message, then verify an expected post-send signal within a short timeout. Expected signals could include pane content changes, prompt transition to a busy state, or echoed input in the pane. If verification fails, return a distinct error instead of claiming success.
- Pros: Directly fixes the false-positive contract, gives operators actionable failures, and stays close to the current architecture.
- Cons: Requires careful heuristics to avoid false negatives when Claude is slow to redraw.
B. Add a stronger transport abstraction for interactive Claude sessions
Replace blind tmux send-keys with a more explicit interaction layer that understands prompt state, idle/busy transitions, and delivery acknowledgement.
- Pros: Better long-term reliability and easier future debugging.
- Cons: Larger change, higher implementation risk, and probably more than this task needs.
C. Narrow fix to cbox send UX only
Keep the current send path, but print a warning that delivery is not guaranteed and advise users to inspect the pane manually.
- Pros: Very small change.
- Cons: Does not solve the operational bug and preserves misleading success output.
Plan
Use approach A.
- Reproduce the current false-positive behavior with focused tests around
send()andtmux.send_message(). - Define a post-send verification contract for interactive sessions: - what counts as confirmed delivery, - how long to wait, - and which failure mode to surface.
- Implement verification in the tmux send path and propagate a distinct status back through
cbox send. - Improve diagnostics so failures capture pane-tail evidence instead of silently succeeding.
- Run focused CBox tests and update docs if CLI behavior changes.
Implementation Progress
- Task created from a live incident where
cbox sendacknowledged delivery to sandboxworktree2, but the pane remained idle until text was injected manually withtmux send-keys. - Initial hypothesis: the send success criterion is too weak. It confirms tmux accepted keys, not that the Claude TUI processed them.
- Related investigation already produced a TERM fix for interactive containers, but that did not eliminate the need for end-to-end delivery verification in
cbox send. - [x] Added
_normalize_pane_content()helper to strip trailing whitespace for stable comparison across tmux redraws. - [x] Rewrote
send_message()intmux.pyto capture pane content before injecting keys, then poll for a pane change withinDELIVERY_VERIFY_TIMEOUT(3s). Returns"undelivered"if the pane stays identical. - [x] Verification only runs when
check_claude=True(the default for interactive sessions); blind sends withcheck_claude=Falseskip verification for non-interactive paths. - [x] Updated
cbox sendCLI handler to surface"undelivered"with actionable diagnostics: pane-tail evidence and an attach suggestion. - [x] Two other
send_messagecall sites incli.py(cbox newinitial command) already handled non-"sent"generically —"undelivered"falls through correctly. - [x] TDD: wrote 4 new unit tests for delivery verification (sent on pane change, undelivered on unchanged pane, skip when check_claude=False, pane tail evidence capture).
- [x] Added 1 CLI-level test confirming
cbox sendexit code and error output for"undelivered". - [x] All tests pass, ruff lint and format clean, no regressions in pre-existing test suite.
Review Feedback
- [ ] Review cleared