Regression prevention and quality gates
Problem
Changes to MCP tool implementations, prompt templates, or eval scoring can silently degrade agent behavior because there are no automated regression gates in CI. A tool schema change that removes a field, a prompt edit that alters response formatting, or an eval threshold adjustment can ship without any check that previously passing agent workflows still succeed. Manual testing catches some regressions but is slow and incomplete. Without automated gates — contract tests for tool schemas, eval-suite runs on PRs, and prompt output diff checks — release quality will erode as development velocity increases.
Context
- Manual review is not enough to protect AI agent tool interfaces, execution workflows, and eval-driven behavior tuning once the change rate increases; regressions will keep shipping unless the highest-value checks become automatic.
- This task should identify what needs gating in CI or structured review and what evidence is sufficient to block a risky change before it reaches users.
- Expected touchpoints include
dataface/ai/, MCP/tool contracts, cloud chat surfaces, eval runners, and prompt artifacts, automated tests, eval/QA checks, and any release or review scripts that can enforce the new gates.
Possible Solutions
- A - Add only a few narrow tests around current bugs: easy to land, but it rarely protects the broader behavior contract.
- B - Recommended: define a regression-gate bundle around the core behavior contract: combine focused tests, snapshots/evals, and required review evidence for risky changes.
- C - Depend on manual smoke testing before each release: better than nothing, but too inconsistent to serve as a durable gate.
Plan
- Identify the highest-risk behavior contracts for AI agent tool interfaces, execution workflows, and eval-driven behavior tuning and the types of changes that should be blocked when they regress.
- Choose the smallest practical set of automated checks and required review evidence that covers those contracts well enough to matter.
- Wire the new gates into the relevant test, review, or release surfaces and document when exceptions are allowed.
- Trial the gates on a few representative changes and tighten the signal-to-noise ratio before expanding the coverage further.
Implementation Progress
Review Feedback
- [ ] Review cleared