tasks/workstreams/context-catalog-nimble/initiatives/question-aware-schema-retrieval-and-narrowing/index.md


type: initiative slug: question-aware-schema-retrieval-and-narrowing title: Question-aware schema retrieval and narrowing workstream: context-catalog-nimble owner: data-ai-engineer-architect status: planned milestone: m2-internal-adoption-design-partners


Question-aware schema retrieval and narrowing

Objective

Build a simple file-and-CLI-first retrieval layer over inspect.json and dbt metadata that can search, rank, and isolate only the schema context needed for a question. Keep M2 focused on local artifacts and commands, but design the retrieval engine so it can later back a runtime search_context tool for agents without changing the core data model.

This initiative explicitly prioritizes narrowing quality and implementation simplicity over speed, indexing sophistication, or retrieval perfection. If a plain Python search function over JSON artifacts does the job, that is good enough for M2.

Why this is M2

M1 got us the raw context substrate:

What is still missing is the question-aware layer between "all available metadata" and "the exact working set the SQL generator should see." That is an internal-adoption and design-partner problem, so M2 is the right milestone:

Scope

In scope for M2

Out of scope for M2

Deliverables

Tasks

Design Thesis

The important split is:

  1. build the corpus
  2. retrieve broadly enough for recall
  3. isolate aggressively enough for prompt usability
  4. generate from the isolated bundle

That means this initiative is not "make schema prompts better." It is "stop making the generator do implicit retrieval inside one giant prompt."

For M2, the retriever can be intentionally boring:

Do not over-rotate on performance or elegance before proving that narrowing helps the agent.

Relationship to other work