tasks/workstreams/context-catalog-nimble/initiatives/question-aware-schema-retrieval-and-narrowing/spec.md

Spec

Design Goals

Non-Goals (M2)

M2 System Shape

1. Corpus build

Build a derived corpus from local artifacts:

Suggested output:

The corpus is a derived artifact, not a source of truth.

Search takes a natural-language question and returns ranked matches from the corpus.

Search responsibilities:

M2 implementation guidance:

Search does not produce prompt-ready context directly.

3. Isolation

Isolation turns ranked hits into a small question-scoped bundle.

Isolation responsibilities:

4. Generation consumer

Generation should receive either:

The generator should not own retrieval logic.

Corpus Record Model

Required fields

Every corpus record should include:

M2 record kinds

Table payload

Recommended fields:

Column payload

Recommended fields:

Relationship payload

Recommended fields:

Doc payload

Recommended fields:

Ranking Model (M2)

M2 ranking should be deterministic and field-weighted.

Recommended score inputs:

Avoid opaque ranking for M2. Result order should be inspectable and debuggable.

Isolation Contract

Bundle output

Suggested path:

Bundle shape

Required top-level fields:

selected_tables

Each selected table should include:

bundle_text

The bundle should also provide a compact text representation that can be passed directly to the current generator without forcing the generator to understand a brand new schema format.

That means M2 can narrow the context without having to rewrite every generation consumer immediately.

CLI Surface

dft context build

Build or refresh the local derived corpus from available artifacts.

dft context search "<question>"

Return ranked corpus hits in text or JSON.

dft context show <id>

Return one record in detail.

dft context bundle "<question>"

Return or persist the isolated working set intended for generation.

Output modes

All commands should support:

Integration Rules

M2 default

The retrieval engine is first consumed by eval and local generation flows, not by every agent surface immediately.

Fallback

If the corpus is unavailable or the command is not configured, existing full-schema behavior remains valid.

Future tool compatibility

Future MCP tool use should call the same core search/isolation library and return the same records or bundle schema.

Validation and Measurement

The initiative is successful when:

Acceptance Criteria