QUERY_VALIDATOR foundation and first integrations
Problem
Build the first query validator path using SQLGlot plus schema profile grain and relationship context for query review diagnostics
Context
- The repo already has SQLGlot-based query analysis, profiler-derived grain and relationship metadata, and schema-context work that could support deterministic query review diagnostics.
- The first validator should focus on high-signal checks such as missing join predicates, likely fanout risk, grain mismatches, and ambiguous aggregation patterns.
- For fanout specifically, the current compile-warning path is too catalog-first: it starts from risky relationship edges and warns when both tables appear in SQL. The validator should invert that and start from query structure.
- The best first fanout detector is: joined query + aggregation + aggregate expressions owned by columns from 2+ tables. Profile and relationship context should refine severity and repair guidance, not be the first detector.
- This also reduces pressure on
inspect.jsonto carry precomputedfanout_riskas the primary query-review mechanism. - This needs a narrow integration path first, likely CLI or eval-facing, rather than trying to wire every product surface at once.
Possible Solutions
- A - Build a freeform AI-only reviewer that emits warnings from prompt reasoning: flexible, but not deterministic enough for a validator foundation.
- B - Recommended: build a deterministic validator core on top of SQLGlot AST plus inspect/context metadata, then expose it through one or two focused entry points first.
- C - Add validator logic independently inside each consumer surface: fast locally, but it guarantees duplicated rules and drift.
Plan
- Define the validator contract: inputs, output schema, and the first diagnostic classes to support.
- Implement a core validation pass using SQLGlot and schema qualification first, then enrich with grain/relationship/fanout metadata from inspect/context artifacts.
- Integrate it into an initial surface such as a CLI path or eval/review workflow and record clear diagnostics.
- Add focused fixtures and tests covering safe joins, risky joins, aggregation mistakes, and unsupported edge cases.
Implementation Progress
- 2026-03-26 design refinement: treat structural fanout detection as the validator baseline, not edge-level
fanout_risklookup. The validator should detect aggregate ownership across joined tables directly from SQL, then use PK/grain/relationship metadata to calibrate severity and suggest pre-aggregation.
QA Exploration
- [ ] QA exploration completed (or N/A for non-UI tasks)
Review Feedback
- [ ] Review cleared