Reduce cbox sandbox startup latency by parallelizing health checks
Problem
Sandbox bootstrap health checks (python, pre-commit, uv, git_auth) ran sequentially via list comprehension, making worst-case wall time 4×15s=60s. Replaced with ThreadPoolExecutor to run all probes concurrently; worst case now ~15s.
_run_container_health_checks() in cbox/cli.py executed 4 independent docker exec probes one after another. Each has a 15s timeout. When multiple probes are slow (common in cold-start sandboxes), startup delay compounds.
Context
- Primary file:
libs/cbox/cbox/cli.py—_run_container_health_checks()(lines 1255-1339) - Types:
libs/cbox/cbox/health.py—HealthCheckResultdataclass - Tests:
libs/cbox/test_sandbox_start.py— 7 health-check unit tests - Constraint:
_run_probeis a pure function (no shared mutable state), making it inherently thread-safe. - Import:
concurrent.futures.ThreadPoolExecutorat module level (line 16 of cli.py).
Possible Solutions
- ThreadPoolExecutor — standard library, minimal change, futures preserve insertion order. ✓
- asyncio.gather — would require converting the entire call chain to async. Overkill.
- multiprocessing.Pool — unnecessary process overhead for I/O-bound subprocess calls.
Plan
Option 1: concurrent.futures.ThreadPoolExecutor with max_workers=len(checks). Submit all probes, collect results via f.result() in definition order. Three-line change in _run_container_health_checks().
Implementation Progress
- Run health checks (
python,pre-commit,uv,git_auth) concurrently. - Preserve deterministic output ordering in CLI rendering.
- Keep per-check timeout and remediation behavior unchanged.
-
Add/adjust tests to validate concurrency-safe behavior.
-
Worst-case health check wall-clock time is near single-check timeout, not sum of all check timeouts.
- Existing warning panel output remains stable and actionable.
-
cbox tests pass.
-
Source issue: repeated 6-12 minute reviews plus additional startup delay from sequential checks.
-
Changed the return line from
[_run_probe(...) for ... in checks]toThreadPoolExecutor.submit()+[f.result() for f in futures]. - Order preserved because futures list matches checks list order.
- Per-check 15s timeout unchanged (each thread still calls
subprocess.run(timeout=15)). _run_probewas already a pure function with no shared mutable state — thread-safe by design.- Added two tests:
test_container_health_checks_run_concurrently(timing) andtest_container_health_checks_preserve_order(deterministic output).
Review Feedback
Round 1 — cbox review (opus), 2026-03-08
- Verdict: APPROVED
- MEDIUM-1: Move
concurrent.futuresimport to module level → Fixed. -
MEDIUM-2: Order test doesn't exercise out-of-order completion → Fixed (added staggered delays).
-
[x] Review cleared