Build lightweight value hints retrieval from inspect artifacts
Problem
A lot of text-to-SQL mistakes are filter mistakes rather than table-choice mistakes. The model may find the right table but still not know whether the right filter value looks like a region code, a status enum, a boolean-like category, or a date range. Today we mostly give schema structure, but not enough static value signal to disambiguate those questions.
We need a cheap value-hints layer that improves retrieval and bundle usefulness without turning M3 into a live exploration project.
Context
The existing retrieval initiative already works from local artifacts such as inspect.json and dbt metadata. Those inspect artifacts often contain enough static hints to improve filter grounding:
- min/max date ranges
- low-cardinality categorical values
- enum-like columns
- null-heavy or sparse columns that should be treated carefully
This task should preserve the project philosophy:
- no live warehouse exploration
- no expensive indexing
- no open-ended value search agent
The goal is to surface only cheap, static hints that can travel inside a question-scoped bundle.
Possible Solutions
- Recommended: derive static value hints from inspect artifacts and inject them selectively into bundles. Add a narrow extraction step that identifies high-signal columns and emits compact hints such as representative enum values or observed date ranges.
Why this is recommended:
- builds directly on existing local artifacts
- improves filter grounding without live queries
- stays compatible with the current simple retrieval architecture
- Add live distinct-value exploration queries.
Trade-off: richer, but much more expensive and operationally riskier. That is a later step, not this one.
- Do nothing and rely on column names only.
Trade-off: cheapest, but filter ambiguity will remain a major failure source.
Plan
- Identify which inspect fields are reliable enough to expose as value hints.
- Define a compact hint format for bundles, for example: - representative categorical members - date min/max - low-cardinality signal - null-rate warning
- Add retrieval logic that includes value hints only for columns relevant to the question and selected tables.
- Compare bundle usefulness with and without value hints in downstream evals.
- Keep the output intentionally compact so hints do not re-expand the prompt into a full profile dump.
Files likely involved
- retrieval/build/bundle code under
context-catalog-nimble - inspect artifact readers
- bundle formatting code used by eval consumers
Success criteria
- question-scoped bundles can include static value hints where helpful
- filter-oriented cases improve or become easier to diagnose
- the retrieval path remains file-native and simple
Implementation Progress
Not started.
QA Exploration
- [x] QA exploration completed (or N/A for non-UI tasks)
N/A - retrieval and bundle task.
Review Feedback
No review feedback yet.
- [ ] Review cleared