cbox send false-positive delivery when sandbox TUI ignores input

ID	INFRA_TOOLING-CBOX_SEND_FALSE_POSITIVE_DELIVERY_WHEN_SANDBOX_TUI_IGNORES_INPUT
Status	completed
Priority	p1
Milestone	m1-ft-analytics-analyst-pilot
Owner	sr-engineer-architect

Problem

Investigate and fix cases where cbox send reports success after tmux send-keys, but the target sandbox Claude TUI does not consume or act on the message. Add end-to-end delivery verification and diagnostics so cbox send fails loudly instead of claiming success when the pane remains idle.

Context

Observed on Friday, March 13, 2026 while managing sandbox worktree2.
uv run cbox send worktree2 "<message>" reported success, but the target sandbox pane stayed at an idle Claude prompt with no visible message delivery or response.
Direct tmux send-keys into the same pane did inject text, so the failure mode is not a simple session-name mismatch.
Repeated sandbox sessions also showed unstable interactive behavior: attached tmux clients sometimes could not type, and the sandbox process later exited unexpectedly mid-task.
Relevant implementation paths:
libs/cbox/cbox/cli.py send() command
libs/cbox/cbox/tmux.py send_message() and prompt detection helpers
libs/cbox/cbox/container.py interactive container launch path
Recent investigation already fixed one likely contributor: interactive containers now use TERM=screen-256color instead of forwarding an incorrect host TERM.
Remaining gap: cbox send treats tmux key injection as success without verifying that Claude actually consumed the message or that the pane state changed.

Possible Solutions

A. End-to-end delivery verification in `cbox send` and `tmux.send_message()` — Recommended

Capture pane state before send, inject the message, then verify an expected post-send signal within a short timeout. Expected signals could include pane content changes, prompt transition to a busy state, or echoed input in the pane. If verification fails, return a distinct error instead of claiming success.

Pros: Directly fixes the false-positive contract, gives operators actionable failures, and stays close to the current architecture.
Cons: Requires careful heuristics to avoid false negatives when Claude is slow to redraw.

B. Add a stronger transport abstraction for interactive Claude sessions

Replace blind tmux send-keys with a more explicit interaction layer that understands prompt state, idle/busy transitions, and delivery acknowledgement.

Pros: Better long-term reliability and easier future debugging.
Cons: Larger change, higher implementation risk, and probably more than this task needs.

C. Narrow fix to `cbox send` UX only

Keep the current send path, but print a warning that delivery is not guaranteed and advise users to inspect the pane manually.

Pros: Very small change.
Cons: Does not solve the operational bug and preserves misleading success output.

Plan

Use approach A.

Reproduce the current false-positive behavior with focused tests around send() and tmux.send_message().
Define a post-send verification contract for interactive sessions: - what counts as confirmed delivery, - how long to wait, - and which failure mode to surface.
Implement verification in the tmux send path and propagate a distinct status back through cbox send.
Improve diagnostics so failures capture pane-tail evidence instead of silently succeeding.
Run focused CBox tests and update docs if CLI behavior changes.

Implementation Progress

Task created from a live incident where cbox send acknowledged delivery to sandbox worktree2, but the pane remained idle until text was injected manually with tmux send-keys.
Initial hypothesis: the send success criterion is too weak. It confirms tmux accepted keys, not that the Claude TUI processed them.
Related investigation already produced a TERM fix for interactive containers, but that did not eliminate the need for end-to-end delivery verification in cbox send.
[x] Added _normalize_pane_content() helper to strip trailing whitespace for stable comparison across tmux redraws.
[x] Rewrote send_message() in tmux.py to capture pane content before injecting keys, then poll for a pane change within DELIVERY_VERIFY_TIMEOUT (3s). Returns "undelivered" if the pane stays identical.
[x] Verification only runs when check_claude=True (the default for interactive sessions); blind sends with check_claude=False skip verification for non-interactive paths.
[x] Updated cbox send CLI handler to surface "undelivered" with actionable diagnostics: pane-tail evidence and an attach suggestion.
[x] Two other send_message call sites in cli.py (cbox new initial command) already handled non-"sent" generically — "undelivered" falls through correctly.
[x] TDD: wrote 4 new unit tests for delivery verification (sent on pane change, undelivered on unchanged pane, skip when check_claude=False, pane tail evidence capture).
[x] Added 1 CLI-level test confirming cbox send exit code and error output for "undelivered".
[x] All tests pass, ruff lint and format clean, no regressions in pre-existing test suite.

Review Feedback

[ ] Review cleared