Task M2 schema-aware query planning for cloud chat questions
Problem
The home-page/org chat can answer some data questions, but it sometimes generates SQL against columns that do not exist in the scoped dataset, then surfaces raw DuckDB binder errors to the user. Add a planning layer between user intent and SQL generation that grounds candidate metrics/dimensions in the available schema/catalog context, chooses the correct dataset/table before writing SQL, and falls back gracefully when the requested concept is not present. This should reduce invalid SQL for prompts like users/documents/revenue/tickets trend summaries and make failures explain missing fields instead of exposing raw backend errors.
Context
This is not primarily a “schema tools missing” problem. The visible symptom is that the chat has access to catalog/list-source context and can answer many prompts, but the generation path still sometimes composes SQL with metric names that do not exist in the actual scoped table/view it selected.
Concrete QA example from the org-home chat:
- user asked for the latest 30-day changes across
users,documents,revenue, andtickets - the agent generated SQL that referenced columns
users,documents,mrr, andopen_tickets - DuckDB returned a binder error because the actual available columns were things like
sessions,tickets_resolved,active_subscriptions_started, andavg_session_minutes - the UI showed the raw backend error before eventually recovering with a later query
So the gap appears to be between:
- schema/catalog retrieval
- intent-to-metric mapping
- final SQL generation
The likely issue is that the model can see some schema context, but there is no explicit planning or validation step that says “for this question, these are the actual available columns/metrics in the chosen dataset; only generate SQL from this vetted set.” This task is about that middle layer.
Possible Solutions
- Recommended: add a schema-aware planning pass before SQL generation. The planner should resolve requested business concepts to actual datasets and columns, produce a vetted query spec, and only then allow SQL synthesis. If required concepts are missing, it should return a graceful explanation instead of invalid SQL.
- Rely on prompt-only improvements to encourage the model to inspect schema more carefully before querying. Trade-off: cheap, but fragile. It will reduce some failures without creating a reliable contract.
- Generate SQL freely, catch binder errors, and retry automatically with error feedback. Trade-off: useful as a fallback, but still wastes cycles and still risks showing the user backend-shaped failures.
- Restrict the chat to prebuilt dashboard metrics only. Trade-off: safer, but too limiting for exploratory analysis and not aligned with the product direction.
The recommended approach is to make schema grounding explicit and testable, with binder-error retry as a secondary safety net rather than the main strategy.
Plan
- Trace the org-home chat path from prompt -> tool/context retrieval -> SQL generation -> execution.
- Identify where business concepts are currently inferred without validating the chosen dataset/columns.
- Add a planning/spec layer that: - chooses the dataset/table/view to analyze - maps requested concepts to actual available fields - records unresolved concepts before SQL is generated
- Prevent SQL generation from using unresolved fields.
- Convert unresolved or invalid requests into user-facing explanations such as “this dataset has
new_usersanddocuments_created, but notmrr.” - Add regression coverage with prompts similar to the failing
users/documents/revenue/ticketstrend question.
Implementation Progress
QA Exploration
- [ ] QA exploration completed (or N/A for non-UI tasks)
Review Feedback
- [ ] Review cleared