Build lightweight value hints retrieval from inspect artifacts

ID	CONTEXT_CATALOG_NIMBLE-BUILD_LIGHTWEIGHT_VALUE_HINTS_RETRIEVAL_FROM_INSPECT_ARTIFACTS
Status	not_started
Priority	p1
Milestone	m3-public-launch
Owner	data-ai-engineer-architect

Problem

A lot of text-to-SQL mistakes are filter mistakes rather than table-choice mistakes. The model may find the right table but still not know whether the right filter value looks like a region code, a status enum, a boolean-like category, or a date range. Today we mostly give schema structure, but not enough static value signal to disambiguate those questions.

We need a cheap value-hints layer that improves retrieval and bundle usefulness without turning M3 into a live exploration project.

Context

The existing retrieval initiative already works from local artifacts such as inspect.json and dbt metadata. Those inspect artifacts often contain enough static hints to improve filter grounding:

min/max date ranges
low-cardinality categorical values
enum-like columns
null-heavy or sparse columns that should be treated carefully

This task should preserve the project philosophy:

no live warehouse exploration
no expensive indexing
no open-ended value search agent

The goal is to surface only cheap, static hints that can travel inside a question-scoped bundle.

Possible Solutions

Recommended: derive static value hints from inspect artifacts and inject them selectively into bundles. Add a narrow extraction step that identifies high-signal columns and emits compact hints such as representative enum values or observed date ranges.

Why this is recommended:

builds directly on existing local artifacts
improves filter grounding without live queries
stays compatible with the current simple retrieval architecture

Add live distinct-value exploration queries.

Trade-off: richer, but much more expensive and operationally riskier. That is a later step, not this one.

Do nothing and rely on column names only.

Trade-off: cheapest, but filter ambiguity will remain a major failure source.

Plan

Identify which inspect fields are reliable enough to expose as value hints.
Define a compact hint format for bundles, for example: - representative categorical members - date min/max - low-cardinality signal - null-rate warning
Add retrieval logic that includes value hints only for columns relevant to the question and selected tables.
Compare bundle usefulness with and without value hints in downstream evals.
Keep the output intentionally compact so hints do not re-expand the prompt into a full profile dump.

Files likely involved

retrieval/build/bundle code under context-catalog-nimble
inspect artifact readers
bundle formatting code used by eval consumers

Success criteria

question-scoped bundles can include static value hints where helpful
filter-oriented cases improve or become easier to diagnose
the retrieval path remains file-native and simple

Implementation Progress

Not started.

QA Exploration

[x] QA exploration completed (or N/A for non-UI tasks)

N/A - retrieval and bundle task.

Review Feedback

No review feedback yet.

[ ] Review cleared