Extract shared text-to-SQL generation function
Problem
Three things need to happen: (1) extract a shared generate_sql() function from the divergent cloud and playground paths, (2) wire both render_dashboard and cloud AIService to share it as their SQL generation core, and (3) expose it as an ask_sql MCP tool so agents can go from natural-language question → SQL + results in one call. Currently there are two separate SQL generation paths with different prompt quality, no standalone text-to-SQL function for evals, and no way for an agent to ask a data question without building an entire dashboard.
Context
Current state: no isolated text-to-SQL path
Two separate SQL generation paths exist today, neither usable standalone:
-
Cloud
AIService.generate_sql()(apps/cloud/apps/ai/service.py:406) — takes a prompt + schema dict, returns SQL. But uses a minimal system prompt with just_format_schema(), missing the rich catalog/profile context that the playground agent gets. -
Playground/MCP agent flow (
dataface/ai/agent.py,apps/playground/prompts/yaml_generation.md) — the agent generates full YAML dashboards with SQL queries embedded inside. Schema context comes fromget_schema_context()which includes profile data, descriptions, etc. But there's no way to ask for just SQL — it always produces a complete dashboard.
The shared function must stay generic
Do not standardize the world around one schema_context blob. Different callers will get schema help in different ways:
- Dataface's profiled inspector/catalog
- a lighter filtered catalog tool
- direct
INFORMATION_SCHEMAqueries - an MCP tool owned by another system
- a checked-in memory file
- no schema tool at all
The shared function should therefore accept either a ready-to-use context bundle or a callable/context-provider object that can fetch/assemble context on demand. The contract is "help me answer this question with the context strategy you were given", not "everyone must pass the same schema dict shape".
This matters for both product flexibility and ablation experiments: we want to vary the schema tool and which fields it exposes without rewriting the generation core.
Three deliverables
A. Extract the core function
Create generate_sql(question, context_provider, model) -> str in dataface/ai/ using the rich context pipeline from get_schema_context() for Dataface's built-in path, but without freezing the API around one static metadata structure. This is the function the eval runner calls.
B. Wire render_dashboard to use the same function
The dashboard generation agent currently has its own SQL generation baked into the YAML prompt. The extracted generate_sql should be the shared core that both the standalone eval path and the dashboard agent's SQL generation use. This doesn't mean the dashboard agent calls generate_sql as a separate step — it means the prompt, context assembly, and SQL generation logic come from the same module. When we improve SQL generation (better prompts, memories, etc.), the improvement shows up in both places automatically.
The cloud AIService.generate_sql() should also be replaced with a thin wrapper around the extracted function, so there aren't three divergent SQL generation paths.
C. Expose as an MCP tool: ask_sql
Add a new MCP tool alongside the existing execute_query:
execute_query— takes SQL, runs it, returns data (already exists)ask_sql— takes a natural-language question, generates SQL using the extracted function, runs it, returns both the SQL and the data
This gives agents a higher-level tool: "answer this question" instead of "run this SQL I wrote." It's the natural-language front door to the database. The existing tool set in dataface/ai/tool_schemas.py already has execute_query, catalog, render_dashboard, etc. — ask_sql fills the gap between "browse the catalog" and "write me a whole dashboard."
The tool should return both the SQL and the results so the agent (or user) can see what query was generated, verify it, and optionally embed it in a dashboard.
Key files
dataface/ai/schema_context.py—get_schema_context()builds the rich contextdataface/ai/agent.py—build_agent_system_prompt()assembles the full promptdataface/ai/llm.py—OpenAIClient/AnthropicClientabstractionsdataface/ai/tool_schemas.py— canonical tool definitions (addask_sqlhere)dataface/ai/tools.py— tool dispatch (addask_sqlhandler)dataface/ai/mcp/tools.py— MCP tool implementationsapps/cloud/apps/ai/service.py— cloud'sgenerate_sql()(replace with wrapper)apps/playground/prompts/yaml_generation.md— playground system prompt
Dependencies
- Needed by task 3 (eval runner) — this is the generation backend the runner calls.
- Needed before ablation experiments — context-level and schema-tool experiments require swapping the context provider and filtered fields without changing the generation function.
- The
ask_sqlMCP tool benefits from the inspector/catalog having profiled the schema already (richer context = better SQL).
Possible Solutions
Recommended: Standalone generate_sql() module with OpenAI structured output
- New
dataface/ai/generate_sql.pywithgenerate_sql(question, schema_context, client)andgenerate_sql_and_execute(). - Uses
get_schema_context()for the rich context (same as playground agent). - Uses OpenAI JSON schema structured output for reliable SQL extraction.
- Cloud
AIService.generate_sql()becomes a thin async wrapper. ask_sqlMCP tool callsgenerate_sql_and_execute()which combines generation + execution.
Trade-off: Currently uses OpenAI's responses API directly rather than the streaming LLMClient protocol. This is intentional — SQL generation doesn't need streaming, and structured output is simpler with the responses API.
Plan
- Create
dataface/ai/generate_sql.pywithgenerate_sql()andgenerate_sql_and_execute(). - Replace cloud
AIService.generate_sql()with thin async wrapper delegating to the shared function. - Add
ASK_SQLschema intool_schemas.py, dispatch intools.py, contract intool_contracts.py. - Register
ask_sqlin MCP server's tool list. - Update agent tool guidance to mention
ask_sql. - Tests:
test_generate_sql.py(core function),test_ask_sql_tool.py(schema, dispatch, contract).
Implementation Progress
Deliverable A: Extract core function — DONE
Created dataface/ai/generate_sql.py with:
- generate_sql(question, schema_context, *, client) -> str — sends the question + schema context to the LLM with a structured JSON output schema, returns the SQL string.
- generate_sql_and_execute(question, schema_context, *, client, source, limit, adapter_registry) -> dict — calls generate_sql() then execute_query(), returns the merged result dict with the generated sql key added.
System prompt uses the same rich schema_context from get_schema_context() that the playground agent uses, ensuring prompt quality parity.
Deliverable B: Wire cloud AIService — DONE
apps/cloud/apps/ai/service.py:AIService.generate_sql() is now a thin async wrapper:
- Converts its legacy schema: dict argument to a context string via _format_schema().
- Creates a sync OpenAIClient and delegates to generate_sql() via run_in_executor().
- Removed the now-unused SQLResponse Pydantic model from the cloud service.
Deliverable C: ask_sql MCP tool — DONE
ASK_SQLschema added todataface/ai/tool_schemas.py(required:question; optional:source,limit).- Dispatch handler added in
dataface/ai/tools.py— resolves schema context viaget_schema_context(), creates an LLM client viacreate_client(), callsgenerate_sql_and_execute(). _ASK_SQL_KEYScontract added todataface/ai/tool_contracts.pywith keys:success, errors, sql, data, columns, row_count, truncated.- Tool registered in
dataface/ai/mcp/server.pytool list. - Agent tool guidance updated in
dataface/ai/agent.py.
Tests
tests/ai/test_generate_sql.py— 5 tests coveringgenerate_sql()return value, system prompt contents, user message, andgenerate_sql_and_execute()success/error paths.tests/ai/test_ask_sql_tool.py— 6 tests covering schema validation, dispatch wiring (with monkeypatched LLM/schema), and contract validation.- Updated
tests/core/test_ai_tools.py— tool count assertions (5→6, 6→7). - Updated
tests/ai/test_tool_contracts.py— addedask_sqlto expected tools set.
Validation
just fix — formatting + lint autofix passed
just ci — all CI checks passed (Black, Ruff, Mypy, Pytest)
QA Exploration
- [x] QA exploration completed (or N/A for non-UI tasks)
N/A for browser QA. Validation: call the extracted function with a benchmark question plus one or more context-provider strategies, verify it returns valid SQL, and compare output quality to the playground agent path.
Review Feedback
- Review executed via
just review/branch review after the finalask_sqlsafety fixes. - Review verdict: APPROVED with notes; final blocker was multi-statement SQL bypass and it was fixed by rejecting multi-statement generated SQL.
- [x] Review cleared