Launch operations and reliability readiness
Problem
The dft-core engine — the YAML compiler, execution adapters, and rendering pipeline — has no operational readiness for production versioning or schema migrations. If a YAML contract change introduces a breaking migration, there is no telemetry to measure how many dashboards are affected, no alerting when compilation error rates spike, and no versioning strategy to safely roll back. Support ownership for "my dashboard broke after an upgrade" is undefined, and there is no incident playbook for a bad core release. A regression in the compiler or execution layer at launch would silently break every dashboard that depends on the affected contract version.
Context
- Public launch for the YAML contract, compiler/normalizer, execution adapters, and release/versioning needs more than feature completeness; it also needs clear ownership, monitoring, support routing, and a practiced response to failures.
- Without explicit launch operations, the team will discover gaps in alerts, escalation, rollback, or user communication during the most visible part of the release.
- Expected touchpoints include
dataface/core/, schema/compiled types, docs, and core test suites, runbooks, monitoring or review surfaces, and any launch-day coordination artifacts.
Possible Solutions
- A - Handle launch ops informally through the people closest to the code: workable for small releases, but too fragile for public launch.
- B - Recommended: define an explicit launch operations package: owners, dashboards/checks, escalation paths, rollback steps, and user/support communication rules.
- C - Delay launch until a broader platform-operations program exists: safest, but likely more process than this specific launch needs.
Plan
- List the launch-day risks for the YAML contract, compiler/normalizer, execution adapters, and release/versioning, including failure modes, ownership gaps, and dependencies on adjacent teams or systems.
- Write the required runbooks and operating checklists covering monitoring, escalation, rollback, and communication.
- Confirm the launch support model with named owners and the minimal dashboards, logs, or review artifacts they need to do the job.
- Run a tabletop or rehearsal pass and update the plan anywhere the team still relies on tribal knowledge instead of written procedure.
Implementation Progress
Review Feedback
- [ ] Review cleared