type: task id: MCP_ANALYST_AGENT-EXTRACT_SHARED_TEXT_TO_SQL_GENERATION_FUNCTION title: Extract shared text-to-SQL generation function description: "Extract a shared generate_sql(question, context_provider, model) function,\ \ wire render_dashboard and cloud AIService to use it as their SQL generation core,\ \ and expose it as an ask_sql MCP tool. The shared core must not hard-code one\ \ schema metadata shape: callers can provide whatever context/tooling strategy\ \ they need." milestone: m2-internal-adoption-design-partners owner: data-ai-engineer-architect status: completed priority: p1 initiative: benchmark-driven-text-to-sql-and-discovery-evals completed_at: '2026-03-18T00:04:10-07:00' completed_by: dave
Three things need to happen: (1) extract a shared generate_sql() function from the divergent cloud and playground paths, (2) wire both render_dashboard and cloud AIService to share it as their SQL generation core, and (3) expose it as an ask_sql MCP tool so agents can go from natural-language question → SQL + results in one call. Currently there are two separate SQL generation paths with different prompt quality, no standalone text-to-SQL function for evals, and no way for an agent to ask a data question without building an entire dashboard.
Two separate SQL generation paths exist today, neither usable standalone:
Cloud AIService.generate_sql() (apps/cloud/apps/ai/service.py:406) — takes a prompt + schema dict, returns SQL. But uses a minimal system prompt with just _format_schema(), missing the rich catalog/profile context that the playground agent gets.
Playground/MCP agent flow (dataface/ai/agent.py, apps/playground/prompts/yaml_generation.md) — the agent generates full YAML dashboards with SQL queries embedded inside. Schema context comes from get_schema_context() which includes profile data, descriptions, etc. But there's no way to ask for just SQL — it always produces a complete dashboard.
Do not standardize the world around one schema_context blob. Different callers will get schema help in different ways:
INFORMATION_SCHEMA queriesThe shared function should therefore accept either a ready-to-use context bundle or a callable/context-provider object that can fetch/assemble context on demand. The contract is "help me answer this question with the context strategy you were given", not "everyone must pass the same schema dict shape".
This matters for both product flexibility and ablation experiments: we want to vary the schema tool and which fields it exposes without rewriting the generation core.
A. Extract the core function
Create generate_sql(question, context_provider, model) -> str in dataface/ai/ using the rich context pipeline from get_schema_context() for Dataface's built-in path, but without freezing the API around one static metadata structure. This is the function the eval runner calls.
B. Wire render_dashboard to use the same function
The dashboard generation agent currently has its own SQL generation baked into the YAML prompt. The extracted generate_sql should be the shared core that both the standalone eval path and the dashboard agent's SQL generation use. This doesn't mean the dashboard agent calls generate_sql as a separate step — it means the prompt, context assembly, and SQL generation logic come from the same module. When we improve SQL generation (better prompts, memories, etc.), the improvement shows up in both places automatically.
The cloud AIService.generate_sql() should also be replaced with a thin wrapper around the extracted function, so there aren't three divergent SQL generation paths.
C. Expose as an MCP tool: ask_sql
Add a new MCP tool alongside the existing execute_query:
execute_query — takes SQL, runs it, returns data (already exists)ask_sql — takes a natural-language question, generates SQL using the extracted function, runs it, returns both the SQL and the dataThis gives agents a higher-level tool: "answer this question" instead of "run this SQL I wrote." It's the natural-language front door to the database. The existing tool set in dataface/ai/tool_schemas.py already has execute_query, catalog, render_dashboard, etc. — ask_sql fills the gap between "browse the catalog" and "write me a whole dashboard."
The tool should return both the SQL and the results so the agent (or user) can see what query was generated, verify it, and optionally embed it in a dashboard.
dataface/ai/schema_context.py — get_schema_context() builds the rich contextdataface/ai/agent.py — build_agent_system_prompt() assembles the full promptdataface/ai/llm.py — OpenAIClient / AnthropicClient abstractionsdataface/ai/tool_schemas.py — canonical tool definitions (add ask_sql here)dataface/ai/tools.py — tool dispatch (add ask_sql handler)dataface/ai/mcp/tools.py — MCP tool implementationsapps/cloud/apps/ai/service.py — cloud's generate_sql() (replace with wrapper)apps/playground/prompts/yaml_generation.md — playground system promptask_sql MCP tool benefits from the inspector/catalog having profiled the schema already (richer context = better SQL).Recommended: Standalone generate_sql() module with OpenAI structured output
dataface/ai/generate_sql.py with generate_sql(question, schema_context, client) and generate_sql_and_execute().get_schema_context() for the rich context (same as playground agent).AIService.generate_sql() becomes a thin async wrapper.ask_sql MCP tool calls generate_sql_and_execute() which combines generation + execution.Trade-off: Currently uses OpenAI's responses API directly rather than the streaming LLMClient protocol. This is intentional — SQL generation doesn't need streaming, and structured output is simpler with the responses API.
dataface/ai/generate_sql.py with generate_sql() and generate_sql_and_execute().AIService.generate_sql() with thin async wrapper delegating to the shared function.ASK_SQL schema in tool_schemas.py, dispatch in tools.py, contract in tool_contracts.py.ask_sql in MCP server's tool list.ask_sql.test_generate_sql.py (core function), test_ask_sql_tool.py (schema, dispatch, contract).Created dataface/ai/generate_sql.py with:
- generate_sql(question, schema_context, *, client) -> str — sends the question + schema context to the LLM with a structured JSON output schema, returns the SQL string.
- generate_sql_and_execute(question, schema_context, *, client, source, limit, adapter_registry) -> dict — calls generate_sql() then execute_query(), returns the merged result dict with the generated sql key added.
System prompt uses the same rich schema_context from get_schema_context() that the playground agent uses, ensuring prompt quality parity.
apps/cloud/apps/ai/service.py:AIService.generate_sql() is now a thin async wrapper:
- Converts its legacy schema: dict argument to a context string via _format_schema().
- Creates a sync OpenAIClient and delegates to generate_sql() via run_in_executor().
- Removed the now-unused SQLResponse Pydantic model from the cloud service.
ask_sql MCP tool — DONEASK_SQL schema added to dataface/ai/tool_schemas.py (required: question; optional: source, limit).dataface/ai/tools.py — resolves schema context via get_schema_context(), creates an LLM client via create_client(), calls generate_sql_and_execute()._ASK_SQL_KEYS contract added to dataface/ai/tool_contracts.py with keys: success, errors, sql, data, columns, row_count, truncated.dataface/ai/mcp/server.py tool list.dataface/ai/agent.py.tests/ai/test_generate_sql.py — 5 tests covering generate_sql() return value, system prompt contents, user message, and generate_sql_and_execute() success/error paths.tests/ai/test_ask_sql_tool.py — 6 tests covering schema validation, dispatch wiring (with monkeypatched LLM/schema), and contract validation.tests/core/test_ai_tools.py — tool count assertions (5→6, 6→7).tests/ai/test_tool_contracts.py — added ask_sql to expected tools set.just fix — formatting + lint autofix passed
just ci — all CI checks passed (Black, Ruff, Mypy, Pytest)
N/A for browser QA. Validation: call the extracted function with a benchmark question plus one or more context-provider strategies, verify it returns valid SQL, and compare output quality to the playground agent path.
<!-- Reviewer comments, what was changed in response, and sign-off. -->
just review/branch review after the final ask_sql safety fixes.