MCP tooling contract for extension + Copilot dashboard/query generation
Problem
The MCP tool schemas (render_dashboard, execute_query, catalog, list_sources) lack formalized input/output contracts, making integrations with IDE extensions and GitHub Copilot fragile. Tool responses vary in structure between success and error cases, required vs. optional parameters are not enforced consistently, and there are no contract tests to catch breaking changes. When an IDE extension or Copilot agent constructs a tool call, minor schema drift can silently produce wrong results or cryptic errors. Without hardened contracts and documented recipes, every new integration is a one-off debugging exercise.
Context
Key files:
- dataface/ai/tool_schemas.py — canonical input schemas (single source of truth for all surfaces)
- dataface/ai/mcp/tools.py — tool implementations (render_dashboard, execute_query, catalog, list_sources, list_dashboards, get_dashboard, get_schema)
- dataface/ai/mcp/search.py — search_dashboards implementation
- dataface/ai/mcp/review.py — review_dashboard implementation
- dataface/ai/tools.py — OpenAI wrapper + dispatch_tool_call() (Playground/Cloud surface)
- dataface/ai/context_contract.py — existing AI_CONTEXT v1 contract (model for this work)
- tests/ai/test_ai_context_contract.py — contract-locking tests for AI_CONTEXT (pattern to follow)
- tests/core/test_mcp.py — existing MCP tool tests (functional, not contract-locking)
- tests/core/test_ai_tools.py — tool definition + dispatch tests
Response shape inconsistencies found:
1. list_dashboards — no success key; error uses error (string) instead of errors (list)
2. list_sources — no success or error keys at all
3. search_dashboards — no success/error keys, just {results: []}
4. execute_query — uses error (singular string) while render_dashboard/get_dashboard use errors (list)
5. render_dashboard success path omits warnings key entirely (present only on error)
Constraints:
- Additive changes only (no removing keys consumers may depend on)
- Follow existing context_contract.py pattern: versioned, validated, tested
- Contract tests must lock shapes so breaking changes are caught by CI
Possible Solutions
A. Pydantic response models per tool
Define typed Pydantic models for each tool response. Strong typing but heavy; tool implementations return dicts today and Pydantic is not used in the AI layer (intentional design choice).
B. Recommended — Lightweight contract module + validation functions
Mirror context_contract.py: define required keys/types per tool in a tool_contracts.py module with validate_tool_response(). Add contract-locking tests. Normalize the three inconsistent tools (list_dashboards, list_sources, search_dashboards) to include success key. Keep error (singular) on execute_query for backward compat but add errors list alongside.
Why recommended: Matches existing patterns, minimal code, no new dependencies, locks contracts without over-engineering.
C. JSON Schema validation
Define JSON Schema per tool response and validate with jsonschema. Heavier dependency for limited benefit — the contract module approach is simpler and already proven in this codebase.
Plan
- Write failing contract tests (
tests/ai/test_tool_contracts.py) that lock the expected response shapes for all 8 MCP tools — TDD per CLAUDE.md. - Create
dataface/ai/tool_contracts.pydefining required response keys/types per tool, withvalidate_tool_response()function. - Normalize tool responses in
mcp/tools.py,mcp/search.py, andmcp/review.py: - Addsuccess: True/Falsetolist_dashboards,list_sources,search_dashboards- Adderrorslist toexecute_queryalongside existingerrorstring - Ensurewarningskey present onrender_dashboardsuccess path - Run contract tests green, then run full
just testfor regressions. - Document integration guidance in task worksheet.
- Validate task frontmatter, rebase, PR.
Implementation Progress
Completed
- [x] Created
dataface/ai/tool_contracts.py— versioned response contracts for all 9 MCP tools (mirrorscontext_contract.pypattern) - [x] Created
tests/ai/test_tool_contracts.py— 27 contract-locking tests covering all tool response shapes (TDD: wrote failing tests first) - [x] Normalized
list_dashboards— addedsuccess: boolto all response paths - [x] Normalized
list_sources— addedsuccess: Trueto response - [x] Normalized
search_dashboards— addedsuccess: Trueto all response paths - [x] Normalized
execute_query— addederrors: list[str]alongside existingerror: str|Nonefor consistency - [x] All 27 contract tests pass; 191 existing MCP/AI tests pass with 0 regressions
Integration Guidance (for extension/Copilot consumers)
Envelope guarantee: Every MCP tool response is a dict with a success: bool key (except get_schema which always succeeds and returns {schema_text, version}).
Error handling recipe:
1. Check response["success"] first.
2. On failure, read response["errors"] (list of strings) for structured error messages. Some tools also provide error (singular string) for backward compatibility.
3. Optional keys error_summary, tips, and warnings may be present on error — use them for richer UX.
Tool contract version: TOOL_CONTRACT_VERSION = "1.0" in dataface/ai/tool_contracts.py. Consumers can import validate_tool_response() for runtime validation during development.
Review Feedback
- Self-review: all changes are additive (no removed keys), contract tests lock shapes, 0 regressions in 191 existing tests.
- [x] Review cleared