Wire question-scoped context bundles into text-to-SQL eval backends

ID	CONTEXT_CATALOG_NIMBLE-WIRE_QUESTION_SCOPED_CONTEXT_BUNDLES_INTO_TEXT_TO_SQL_EVAL_BACKENDS
Status	not_started
Priority	p1
Milestone	m2-internal-adoption-design-partners
Owner	data-ai-engineer-architect
Initiative	question-aware-schema-retrieval-and-narrowing

Problem

Teach the shared SQL generation and eval backend layer to consume question-scoped context bundles from the retrieval CLI so local text-to-SQL runs can compare full-schema prompting against retrieved-and-isolated context without building a separate retrieval system inside the model prompt.

Context

The retrieval initiative only matters if generation can actually consume the narrowed context.

Relevant current paths:

dataface/ai/generate_sql.py accepts a schema-context string
apps/evals/sql/backends.py resolves built-in generation backends
apps/evals/sql/context.py already has pluggable context-provider patterns
the eval runner can compare backend configurations and metadata across runs

So this task should not build a second retrieval system inside the backend. It should teach the generation/eval layer to consume the bundle artifact produced by the CLI/search layer and compare it honestly with the current full-schema baseline.

This task also should not turn into retrieval optimization work. It should consume whatever simple bundle generator M2 gives us, even if that bundle came from a very naive Python search implementation.

Possible Solutions

Recommended: consume question-scoped bundles through the existing backend/context-provider seam Add a bundle-aware context provider or backend mode that loads the isolated bundle text/JSON for each question and feeds only that narrowed context to the shared generator.

Why this is recommended:

reuses the eval backend architecture already in place
keeps retrieval and generation loosely coupled
makes A/B comparison against full-context prompting straightforward

Reimplement retrieval logic directly inside each backend.

Trade-off: duplicates logic and guarantees drift between the CLI retriever and eval paths.

Change generate_sql() to own retrieval itself.

Trade-off: collapses retrieval and generation back together, which is the exact architecture problem this initiative is trying to fix.

Plan

Define how a question maps to a saved or on-demand bundle artifact.
Add a bundle-backed context-provider mode for eval backends.
Preserve the current full-schema context mode as the baseline.
Ensure backend metadata records: - bundle mode - bundle path or strategy - prompt context size or reduction metadata when available
Run canary eval comparisons between: - full schema context - question-scoped bundle context
Add focused tests that prove the backend consumes bundle context without duplicating retrieval logic.

Likely files

apps/evals/sql/backends.py
apps/evals/sql/context.py
dataface/ai/generate_sql.py
eval tests under tests/evals/sql/
possibly the shared text-to-SQL task surfaces that currently call full schema context directly

Explicit anti-goals for this task

no backend-local search/index implementation
no search-speed optimization
no tight coupling between eval backend logic and retrieval internals beyond the bundle contract

Implementation Progress

Not started.

QA Exploration

[x] QA exploration completed (or N/A for non-UI tasks)

N/A - eval/backend integration task.

Review Feedback

[ ] Review cleared