Task M2 schema-aware query planning for cloud chat questions

ID	MCP_ANALYST_AGENT-TASK_M2_SCHEMA_AWARE_QUERY_PLANNING_FOR_CLOUD_CHAT_QUESTIONS
Status	not_started
Priority	p1
Milestone	m2-internal-adoption-design-partners
Owner	data-ai-engineer-architect
Initiative	ai-agent-surfaces

Problem

The home-page/org chat can answer some data questions, but it sometimes generates SQL against columns that do not exist in the scoped dataset, then surfaces raw DuckDB binder errors to the user. Add a planning layer between user intent and SQL generation that grounds candidate metrics/dimensions in the available schema/catalog context, chooses the correct dataset/table before writing SQL, and falls back gracefully when the requested concept is not present. This should reduce invalid SQL for prompts like users/documents/revenue/tickets trend summaries and make failures explain missing fields instead of exposing raw backend errors.

Context

This is not primarily a “schema tools missing” problem. The visible symptom is that the chat has access to catalog/list-source context and can answer many prompts, but the generation path still sometimes composes SQL with metric names that do not exist in the actual scoped table/view it selected.

Concrete QA example from the org-home chat:

user asked for the latest 30-day changes across users, documents, revenue, and tickets
the agent generated SQL that referenced columns users, documents, mrr, and open_tickets
DuckDB returned a binder error because the actual available columns were things like sessions, tickets_resolved, active_subscriptions_started, and avg_session_minutes
the UI showed the raw backend error before eventually recovering with a later query

So the gap appears to be between:

schema/catalog retrieval
intent-to-metric mapping
final SQL generation

The likely issue is that the model can see some schema context, but there is no explicit planning or validation step that says “for this question, these are the actual available columns/metrics in the chosen dataset; only generate SQL from this vetted set.” This task is about that middle layer.

Possible Solutions

Recommended: add a schema-aware planning pass before SQL generation. The planner should resolve requested business concepts to actual datasets and columns, produce a vetted query spec, and only then allow SQL synthesis. If required concepts are missing, it should return a graceful explanation instead of invalid SQL.
Rely on prompt-only improvements to encourage the model to inspect schema more carefully before querying. Trade-off: cheap, but fragile. It will reduce some failures without creating a reliable contract.
Generate SQL freely, catch binder errors, and retry automatically with error feedback. Trade-off: useful as a fallback, but still wastes cycles and still risks showing the user backend-shaped failures.
Restrict the chat to prebuilt dashboard metrics only. Trade-off: safer, but too limiting for exploratory analysis and not aligned with the product direction.

The recommended approach is to make schema grounding explicit and testable, with binder-error retry as a secondary safety net rather than the main strategy.

Plan

Trace the org-home chat path from prompt -> tool/context retrieval -> SQL generation -> execution.
Identify where business concepts are currently inferred without validating the chosen dataset/columns.
Add a planning/spec layer that: - chooses the dataset/table/view to analyze - maps requested concepts to actual available fields - records unresolved concepts before SQL is generated
Prevent SQL generation from using unresolved fields.
Convert unresolved or invalid requests into user-facing explanations such as “this dataset has new_users and documents_created, but not mrr.”
Add regression coverage with prompts similar to the failing users/documents/revenue/tickets trend question.

Task M2 schema-aware query planning for cloud chat questions

Problem

Context

Possible Solutions

Plan

Implementation Progress

QA Exploration

Review Feedback