Launch operations and reliability readiness
Problem
The IDE extension integrates YAML editing, live preview, diagnostics, and agent-assisted workflows into VS Code and Cursor, but has no operational infrastructure for a public release. There is no telemetry on extension activation failures, preview rendering errors, or diagnostic latency, so degraded experiences go unreported. If a VS Code update or a change in the language server breaks YAML validation or the live preview panel, there is no alerting to catch it. Support ownership for extension marketplace issues is unassigned, and there is no incident playbook for diagnosing crashes, compatibility regressions, or agent integration failures across editor versions.
Context
- Public launch for analyst authoring in VS Code/Cursor with preview, diagnostics, and assist needs more than feature completeness; it also needs clear ownership, monitoring, support routing, and a practiced response to failures.
- Without explicit launch operations, the team will discover gaps in alerts, escalation, rollback, or user communication during the most visible part of the release.
- Expected touchpoints include
apps/ide/vscode-extension/, preview/inspector runtime code, and extension docs/tests, runbooks, monitoring or review surfaces, and any launch-day coordination artifacts.
Possible Solutions
- A - Handle launch ops informally through the people closest to the code: workable for small releases, but too fragile for public launch.
- B - Recommended: define an explicit launch operations package: owners, dashboards/checks, escalation paths, rollback steps, and user/support communication rules.
- C - Delay launch until a broader platform-operations program exists: safest, but likely more process than this specific launch needs.
Plan
- List the launch-day risks for analyst authoring in VS Code/Cursor with preview, diagnostics, and assist, including failure modes, ownership gaps, and dependencies on adjacent teams or systems.
- Write the required runbooks and operating checklists covering monitoring, escalation, rollback, and communication.
- Confirm the launch support model with named owners and the minimal dashboards, logs, or review artifacts they need to do the job.
- Run a tabletop or rehearsal pass and update the plan anywhere the team still relies on tribal knowledge instead of written procedure.
Implementation Progress
Review Feedback
- [ ] Review cleared