Sustainable operating model
Problem
The MCP server and agent evaluation framework are maintained through ad-hoc heroics rather than a defined operating model. There is no documented on-call rotation for MCP tool failures, no release cadence for tool schema or prompt updates, no triage process for incoming eval regressions, and no runbook for common operational tasks (deploying a new tool, deprecating an old one, rotating cached profiles). As the system grows beyond a single maintainer, this lack of operational structure will lead to dropped incidents, inconsistent releases, and contributor confusion about ownership.
Context
- A launch can succeed briefly even with fuzzy ownership, but AI agent tool interfaces, execution workflows, and eval-driven behavior tuning will drift quickly without a clear model for maintenance, triage, and decision-making.
- This task is about defining who owns backlog hygiene, review standards, incidents, documentation, and the cadence for future improvements.
- Expected touchpoints include
dataface/ai/, MCP/tool contracts, cloud chat surfaces, eval runners, and prompt artifacts, runbooks, planning docs, and team processes that currently rely too heavily on shared memory.
Possible Solutions
- A - Let the current contributors coordinate informally: low overhead, but it becomes brittle as scope and contributors grow.
- B - Recommended: define a lightweight operating model with named owners and cadences: make maintenance, incident response, prioritization, and release decisions explicit.
- C - Centralize all ownership in one person or team indefinitely: clearer in the short term, but usually unsustainable and a bottleneck.
Plan
- Map the recurring operational decisions around AI agent tool interfaces, execution workflows, and eval-driven behavior tuning and identify where ownership, handoff, or cadence is currently unclear.
- Document the operating model: owners, review loops, incident or support handling, documentation upkeep, and backlog-management expectations.
- Align the model with the actual command/docs/test surfaces that people use day to day so it is operational rather than aspirational.
- Publish the model in the relevant planning/runbook surfaces and refine it after one real cycle of use.
Implementation Progress
Review Feedback
- [ ] Review cleared