Extract shared text-to-SQL generation function

ID	MCP_ANALYST_AGENT-EXTRACT_SHARED_TEXT_TO_SQL_GENERATION_FUNCTION
Status	completed
Priority	p1
Milestone	m2-internal-adoption-design-partners
Owner	data-ai-engineer-architect
Initiative	benchmark-driven-text-to-sql-and-discovery-evals
Completed by	dave
Completed	2026-03-18

Problem

Three things need to happen: (1) extract a shared generate_sql() function from the divergent cloud and playground paths, (2) wire both render_dashboard and cloud AIService to share it as their SQL generation core, and (3) expose it as an ask_sql MCP tool so agents can go from natural-language question → SQL + results in one call. Currently there are two separate SQL generation paths with different prompt quality, no standalone text-to-SQL function for evals, and no way for an agent to ask a data question without building an entire dashboard.

Context

Current state: no isolated text-to-SQL path

Two separate SQL generation paths exist today, neither usable standalone:

Cloud AIService.generate_sql() (apps/cloud/apps/ai/service.py:406) — takes a prompt + schema dict, returns SQL. But uses a minimal system prompt with just _format_schema(), missing the rich catalog/profile context that the playground agent gets.
Playground/MCP agent flow (dataface/ai/agent.py, apps/playground/prompts/yaml_generation.md) — the agent generates full YAML dashboards with SQL queries embedded inside. Schema context comes from get_schema_context() which includes profile data, descriptions, etc. But there's no way to ask for just SQL — it always produces a complete dashboard.

The shared function must stay generic

Do not standardize the world around one schema_context blob. Different callers will get schema help in different ways:

Dataface's profiled inspector/catalog
a lighter filtered catalog tool
direct INFORMATION_SCHEMA queries
an MCP tool owned by another system
a checked-in memory file
no schema tool at all

The shared function should therefore accept either a ready-to-use context bundle or a callable/context-provider object that can fetch/assemble context on demand. The contract is "help me answer this question with the context strategy you were given", not "everyone must pass the same schema dict shape".

This matters for both product flexibility and ablation experiments: we want to vary the schema tool and which fields it exposes without rewriting the generation core.

Three deliverables

A. Extract the core function

Create generate_sql(question, context_provider, model) -> str in dataface/ai/ using the rich context pipeline from get_schema_context() for Dataface's built-in path, but without freezing the API around one static metadata structure. This is the function the eval runner calls.

B. Wire render_dashboard to use the same function

The dashboard generation agent currently has its own SQL generation baked into the YAML prompt. The extracted generate_sql should be the shared core that both the standalone eval path and the dashboard agent's SQL generation use. This doesn't mean the dashboard agent calls generate_sql as a separate step — it means the prompt, context assembly, and SQL generation logic come from the same module. When we improve SQL generation (better prompts, memories, etc.), the improvement shows up in both places automatically.

The cloud AIService.generate_sql() should also be replaced with a thin wrapper around the extracted function, so there aren't three divergent SQL generation paths.

C. Expose as an MCP tool: ask_sql

Add a new MCP tool alongside the existing execute_query:

execute_query — takes SQL, runs it, returns data (already exists)
ask_sql — takes a natural-language question, generates SQL using the extracted function, runs it, returns both the SQL and the data

This gives agents a higher-level tool: "answer this question" instead of "run this SQL I wrote." It's the natural-language front door to the database. The existing tool set in dataface/ai/tool_schemas.py already has execute_query, catalog, render_dashboard, etc. — ask_sql fills the gap between "browse the catalog" and "write me a whole dashboard."

The tool should return both the SQL and the results so the agent (or user) can see what query was generated, verify it, and optionally embed it in a dashboard.

Key files

dataface/ai/schema_context.py — get_schema_context() builds the rich context
dataface/ai/agent.py — build_agent_system_prompt() assembles the full prompt
dataface/ai/llm.py — OpenAIClient / AnthropicClient abstractions
dataface/ai/tool_schemas.py — canonical tool definitions (add ask_sql here)
dataface/ai/tools.py — tool dispatch (add ask_sql handler)
dataface/ai/mcp/tools.py — MCP tool implementations
apps/cloud/apps/ai/service.py — cloud's generate_sql() (replace with wrapper)
apps/playground/prompts/yaml_generation.md — playground system prompt

Dependencies

Needed by task 3 (eval runner) — this is the generation backend the runner calls.
Needed before ablation experiments — context-level and schema-tool experiments require swapping the context provider and filtered fields without changing the generation function.
The ask_sql MCP tool benefits from the inspector/catalog having profiled the schema already (richer context = better SQL).

Possible Solutions

Recommended: Standalone generate_sql() module with OpenAI structured output

New dataface/ai/generate_sql.py with generate_sql(question, schema_context, client) and generate_sql_and_execute().
Uses get_schema_context() for the rich context (same as playground agent).
Uses OpenAI JSON schema structured output for reliable SQL extraction.
Cloud AIService.generate_sql() becomes a thin async wrapper.
ask_sql MCP tool calls generate_sql_and_execute() which combines generation + execution.

Trade-off: Currently uses OpenAI's responses API directly rather than the streaming LLMClient protocol. This is intentional — SQL generation doesn't need streaming, and structured output is simpler with the responses API.

Plan

Create dataface/ai/generate_sql.py with generate_sql() and generate_sql_and_execute().
Replace cloud AIService.generate_sql() with thin async wrapper delegating to the shared function.
Add ASK_SQL schema in tool_schemas.py, dispatch in tools.py, contract in tool_contracts.py.
Register ask_sql in MCP server's tool list.
Update agent tool guidance to mention ask_sql.
Tests: test_generate_sql.py (core function), test_ask_sql_tool.py (schema, dispatch, contract).

Implementation Progress

Deliverable A: Extract core function — DONE

Created dataface/ai/generate_sql.py with: - generate_sql(question, schema_context, *, client) -> str — sends the question + schema context to the LLM with a structured JSON output schema, returns the SQL string. - generate_sql_and_execute(question, schema_context, *, client, source, limit, adapter_registry) -> dict — calls generate_sql() then execute_query(), returns the merged result dict with the generated sql key added.

System prompt uses the same rich schema_context from get_schema_context() that the playground agent uses, ensuring prompt quality parity.

Deliverable B: Wire cloud AIService — DONE

apps/cloud/apps/ai/service.py:AIService.generate_sql() is now a thin async wrapper: - Converts its legacy schema: dict argument to a context string via _format_schema(). - Creates a sync OpenAIClient and delegates to generate_sql() via run_in_executor(). - Removed the now-unused SQLResponse Pydantic model from the cloud service.

Deliverable C: `ask_sql` MCP tool — DONE

ASK_SQL schema added to dataface/ai/tool_schemas.py (required: question; optional: source, limit).
Dispatch handler added in dataface/ai/tools.py — resolves schema context via get_schema_context(), creates an LLM client via create_client(), calls generate_sql_and_execute().
_ASK_SQL_KEYS contract added to dataface/ai/tool_contracts.py with keys: success, errors, sql, data, columns, row_count, truncated.
Tool registered in dataface/ai/mcp/server.py tool list.
Agent tool guidance updated in dataface/ai/agent.py.

Tests

tests/ai/test_generate_sql.py — 5 tests covering generate_sql() return value, system prompt contents, user message, and generate_sql_and_execute() success/error paths.
tests/ai/test_ask_sql_tool.py — 6 tests covering schema validation, dispatch wiring (with monkeypatched LLM/schema), and contract validation.
Updated tests/core/test_ai_tools.py — tool count assertions (5→6, 6→7).
Updated tests/ai/test_tool_contracts.py — added ask_sql to expected tools set.

Validation

just fix    — formatting + lint autofix passed
just ci     — all CI checks passed (Black, Ruff, Mypy, Pytest)

QA Exploration

[x] QA exploration completed (or N/A for non-UI tasks)

N/A for browser QA. Validation: call the extracted function with a benchmark question plus one or more context-provider strategies, verify it returns valid SQL, and compare output quality to the playground agent path.

Review Feedback

Review executed via just review/branch review after the final ask_sql safety fixes.
Review verdict: APPROVED with notes; final blocker was multi-statement SQL bypass and it was fixed by rejecting multi-statement generated SQL.
[x] Review cleared