Dataface Tasks

Build lightweight value hints retrieval from inspect artifacts

IDCONTEXT_CATALOG_NIMBLE-BUILD_LIGHTWEIGHT_VALUE_HINTS_RETRIEVAL_FROM_INSPECT_ARTIFACTS
Statusnot_started
Priorityp1
Milestonem3-public-launch
Ownerdata-ai-engineer-architect

Problem

A lot of text-to-SQL mistakes are filter mistakes rather than table-choice mistakes. The model may find the right table but still not know whether the right filter value looks like a region code, a status enum, a boolean-like category, or a date range. Today we mostly give schema structure, but not enough static value signal to disambiguate those questions.

We need a cheap value-hints layer that improves retrieval and bundle usefulness without turning M3 into a live exploration project.

Context

The existing retrieval initiative already works from local artifacts such as inspect.json and dbt metadata. Those inspect artifacts often contain enough static hints to improve filter grounding:

  • min/max date ranges
  • low-cardinality categorical values
  • enum-like columns
  • null-heavy or sparse columns that should be treated carefully

This task should preserve the project philosophy:

  • no live warehouse exploration
  • no expensive indexing
  • no open-ended value search agent

The goal is to surface only cheap, static hints that can travel inside a question-scoped bundle.

Possible Solutions

  1. Recommended: derive static value hints from inspect artifacts and inject them selectively into bundles. Add a narrow extraction step that identifies high-signal columns and emits compact hints such as representative enum values or observed date ranges.

Why this is recommended:

  • builds directly on existing local artifacts
  • improves filter grounding without live queries
  • stays compatible with the current simple retrieval architecture
  1. Add live distinct-value exploration queries.

Trade-off: richer, but much more expensive and operationally riskier. That is a later step, not this one.

  1. Do nothing and rely on column names only.

Trade-off: cheapest, but filter ambiguity will remain a major failure source.

Plan

  1. Identify which inspect fields are reliable enough to expose as value hints.
  2. Define a compact hint format for bundles, for example: - representative categorical members - date min/max - low-cardinality signal - null-rate warning
  3. Add retrieval logic that includes value hints only for columns relevant to the question and selected tables.
  4. Compare bundle usefulness with and without value hints in downstream evals.
  5. Keep the output intentionally compact so hints do not re-expand the prompt into a full profile dump.

Files likely involved

  • retrieval/build/bundle code under context-catalog-nimble
  • inspect artifact readers
  • bundle formatting code used by eval consumers

Success criteria

  • question-scoped bundles can include static value hints where helpful
  • filter-oriented cases improve or become easier to diagnose
  • the retrieval path remains file-native and simple

Implementation Progress

Not started.

QA Exploration

  • [x] QA exploration completed (or N/A for non-UI tasks)

N/A - retrieval and bundle task.

Review Feedback

No review feedback yet.

  • [ ] Review cleared