Dataface Tasks

QUERY_VALIDATOR foundation and first integrations

IDCONTEXT_CATALOG_NIMBLE-QUERY_VALIDATOR_FOUNDATION_AND_FIRST_INTEGRATIONS
Statusnot_started
Priorityp1
Milestonem3-public-launch
Ownerdata-ai-engineer-architect

Problem

Build the first query validator path using SQLGlot plus schema profile grain and relationship context for query review diagnostics

Context

  • The repo already has SQLGlot-based query analysis, profiler-derived grain and relationship metadata, and schema-context work that could support deterministic query review diagnostics.
  • The first validator should focus on high-signal checks such as missing join predicates, likely fanout risk, grain mismatches, and ambiguous aggregation patterns.
  • For fanout specifically, the current compile-warning path is too catalog-first: it starts from risky relationship edges and warns when both tables appear in SQL. The validator should invert that and start from query structure.
  • The best first fanout detector is: joined query + aggregation + aggregate expressions owned by columns from 2+ tables. Profile and relationship context should refine severity and repair guidance, not be the first detector.
  • This also reduces pressure on inspect.json to carry precomputed fanout_risk as the primary query-review mechanism.
  • This needs a narrow integration path first, likely CLI or eval-facing, rather than trying to wire every product surface at once.

Possible Solutions

  • A - Build a freeform AI-only reviewer that emits warnings from prompt reasoning: flexible, but not deterministic enough for a validator foundation.
  • B - Recommended: build a deterministic validator core on top of SQLGlot AST plus inspect/context metadata, then expose it through one or two focused entry points first.
  • C - Add validator logic independently inside each consumer surface: fast locally, but it guarantees duplicated rules and drift.

Plan

  1. Define the validator contract: inputs, output schema, and the first diagnostic classes to support.
  2. Implement a core validation pass using SQLGlot and schema qualification first, then enrich with grain/relationship/fanout metadata from inspect/context artifacts.
  3. Integrate it into an initial surface such as a CLI path or eval/review workflow and record clear diagnostics.
  4. Add focused fixtures and tests covering safe joins, risky joins, aggregation mistakes, and unsupported edge cases.

Implementation Progress

  • 2026-03-26 design refinement: treat structural fanout detection as the validator baseline, not edge-level fanout_risk lookup. The validator should detect aggregate ownership across joined tables directly from SQL, then use PK/grain/relationship metadata to calibrate severity and suggest pre-aggregation.

QA Exploration

  • [ ] QA exploration completed (or N/A for non-UI tasks)

Review Feedback

  • [ ] Review cleared