tasks/workstreams/mcp-analyst-agent/tasks/extract-shared-text-to-sql-generation-function.md


type: task id: MCP_ANALYST_AGENT-EXTRACT_SHARED_TEXT_TO_SQL_GENERATION_FUNCTION title: Extract shared text-to-SQL generation function description: "Extract a shared generate_sql(question, context_provider, model) function,\ \ wire render_dashboard and cloud AIService to use it as their SQL generation core,\ \ and expose it as an ask_sql MCP tool. The shared core must not hard-code one\ \ schema metadata shape: callers can provide whatever context/tooling strategy\ \ they need." milestone: m2-internal-adoption-design-partners owner: data-ai-engineer-architect status: completed priority: p1 initiative: benchmark-driven-text-to-sql-and-discovery-evals completed_at: '2026-03-18T00:04:10-07:00' completed_by: dave


Extract shared text-to-SQL generation function

Problem

Three things need to happen: (1) extract a shared generate_sql() function from the divergent cloud and playground paths, (2) wire both render_dashboard and cloud AIService to share it as their SQL generation core, and (3) expose it as an ask_sql MCP tool so agents can go from natural-language question → SQL + results in one call. Currently there are two separate SQL generation paths with different prompt quality, no standalone text-to-SQL function for evals, and no way for an agent to ask a data question without building an entire dashboard.

Context

Current state: no isolated text-to-SQL path

Two separate SQL generation paths exist today, neither usable standalone:

  1. Cloud AIService.generate_sql() (apps/cloud/apps/ai/service.py:406) — takes a prompt + schema dict, returns SQL. But uses a minimal system prompt with just _format_schema(), missing the rich catalog/profile context that the playground agent gets.

  2. Playground/MCP agent flow (dataface/ai/agent.py, apps/playground/prompts/yaml_generation.md) — the agent generates full YAML dashboards with SQL queries embedded inside. Schema context comes from get_schema_context() which includes profile data, descriptions, etc. But there's no way to ask for just SQL — it always produces a complete dashboard.

The shared function must stay generic

Do not standardize the world around one schema_context blob. Different callers will get schema help in different ways:

The shared function should therefore accept either a ready-to-use context bundle or a callable/context-provider object that can fetch/assemble context on demand. The contract is "help me answer this question with the context strategy you were given", not "everyone must pass the same schema dict shape".

This matters for both product flexibility and ablation experiments: we want to vary the schema tool and which fields it exposes without rewriting the generation core.

Three deliverables

A. Extract the core function

Create generate_sql(question, context_provider, model) -> str in dataface/ai/ using the rich context pipeline from get_schema_context() for Dataface's built-in path, but without freezing the API around one static metadata structure. This is the function the eval runner calls.

B. Wire render_dashboard to use the same function

The dashboard generation agent currently has its own SQL generation baked into the YAML prompt. The extracted generate_sql should be the shared core that both the standalone eval path and the dashboard agent's SQL generation use. This doesn't mean the dashboard agent calls generate_sql as a separate step — it means the prompt, context assembly, and SQL generation logic come from the same module. When we improve SQL generation (better prompts, memories, etc.), the improvement shows up in both places automatically.

The cloud AIService.generate_sql() should also be replaced with a thin wrapper around the extracted function, so there aren't three divergent SQL generation paths.

C. Expose as an MCP tool: ask_sql

Add a new MCP tool alongside the existing execute_query:

This gives agents a higher-level tool: "answer this question" instead of "run this SQL I wrote." It's the natural-language front door to the database. The existing tool set in dataface/ai/tool_schemas.py already has execute_query, catalog, render_dashboard, etc. — ask_sql fills the gap between "browse the catalog" and "write me a whole dashboard."

The tool should return both the SQL and the results so the agent (or user) can see what query was generated, verify it, and optionally embed it in a dashboard.

Key files

Dependencies

Possible Solutions

Recommended: Standalone generate_sql() module with OpenAI structured output

Trade-off: Currently uses OpenAI's responses API directly rather than the streaming LLMClient protocol. This is intentional — SQL generation doesn't need streaming, and structured output is simpler with the responses API.

Plan

  1. Create dataface/ai/generate_sql.py with generate_sql() and generate_sql_and_execute().
  2. Replace cloud AIService.generate_sql() with thin async wrapper delegating to the shared function.
  3. Add ASK_SQL schema in tool_schemas.py, dispatch in tools.py, contract in tool_contracts.py.
  4. Register ask_sql in MCP server's tool list.
  5. Update agent tool guidance to mention ask_sql.
  6. Tests: test_generate_sql.py (core function), test_ask_sql_tool.py (schema, dispatch, contract).

Implementation Progress

Deliverable A: Extract core function — DONE

Created dataface/ai/generate_sql.py with: - generate_sql(question, schema_context, *, client) -> str — sends the question + schema context to the LLM with a structured JSON output schema, returns the SQL string. - generate_sql_and_execute(question, schema_context, *, client, source, limit, adapter_registry) -> dict — calls generate_sql() then execute_query(), returns the merged result dict with the generated sql key added.

System prompt uses the same rich schema_context from get_schema_context() that the playground agent uses, ensuring prompt quality parity.

Deliverable B: Wire cloud AIService — DONE

apps/cloud/apps/ai/service.py:AIService.generate_sql() is now a thin async wrapper: - Converts its legacy schema: dict argument to a context string via _format_schema(). - Creates a sync OpenAIClient and delegates to generate_sql() via run_in_executor(). - Removed the now-unused SQLResponse Pydantic model from the cloud service.

Deliverable C: ask_sql MCP tool — DONE

Tests

Validation

just fix    — formatting + lint autofix passed
just ci     — all CI checks passed (Black, Ruff, Mypy, Pytest)

QA Exploration

N/A for browser QA. Validation: call the extracted function with a benchmark question plus one or more context-provider strategies, verify it returns valid SQL, and compare output quality to the playground agent path.

Review Feedback

<!-- Reviewer comments, what was changed in response, and sign-off. -->