Launch operations and reliability readiness

ID	M3_PUBLIC_LAUNCH-MCP_ANALYST_AGENT-03
Status	not_started
Priority	p0
Milestone	m3-public-launch
Owner	data-ai-engineer-architect

Problem

The MCP analyst agent exposes Dataface capabilities to AI agents via an MCP server and includes an evaluation framework for measuring agent quality, but neither system has operational readiness for production. There is no telemetry on MCP tool call latency, error rates, or guardrail violation frequency. If the agent generates incorrect SQL, exceeds token budgets, or the MCP server silently drops requests, nothing detects the degradation. Support ownership for agent quality regressions is undefined, and there is no incident playbook for scenarios like a guardrail bypass producing dangerous queries or an eval regression shipping undetected. At launch, AI agent failures would be invisible until users report wrong answers.

Context

Public launch for AI agent tool interfaces, execution workflows, and eval-driven behavior tuning needs more than feature completeness; it also needs clear ownership, monitoring, support routing, and a practiced response to failures.
Without explicit launch operations, the team will discover gaps in alerts, escalation, rollback, or user communication during the most visible part of the release.
Expected touchpoints include dataface/ai/, MCP/tool contracts, cloud chat surfaces, eval runners, and prompt artifacts, runbooks, monitoring or review surfaces, and any launch-day coordination artifacts.

Possible Solutions

A - Handle launch ops informally through the people closest to the code: workable for small releases, but too fragile for public launch.
B - Recommended: define an explicit launch operations package: owners, dashboards/checks, escalation paths, rollback steps, and user/support communication rules.
C - Delay launch until a broader platform-operations program exists: safest, but likely more process than this specific launch needs.

Plan

List the launch-day risks for AI agent tool interfaces, execution workflows, and eval-driven behavior tuning, including failure modes, ownership gaps, and dependencies on adjacent teams or systems.
Write the required runbooks and operating checklists covering monitoring, escalation, rollback, and communication.
Confirm the launch support model with named owners and the minimal dashboards, logs, or review artifacts they need to do the job.
Run a tabletop or rehearsal pass and update the plan anywhere the team still relies on tribal knowledge instead of written procedure.

Implementation Progress

Review Feedback

[ ] Review cleared