Dataface Tasks

Iterate on question-aware retrieval with interface and result experiments

IDCONTEXT_CATALOG_NIMBLE-ITERATE_ON_QUESTION_AWARE_RETRIEVAL_WITH_INTERFACE_AND_RESULT_EXPERIMENTS
Statusnot_started
Priorityp2
Milestonem2-internal-adoption-design-partners
Ownerdata-ai-engineer-architect
Initiativequestion-aware-schema-retrieval-and-narrowing

Problem

After the initial A/B eval comparing retrieval versus full-context prompting, run a small set of follow-up experiments to improve the question-aware retrieval path. Focus on interface and result quality experiments such as different search outputs, bundle shapes, ranking heuristics, and isolation policies rather than speed or indexing optimization.

Context

This task should only start after the direct A/B comparison exists. The goal is not to optimize blindly; it is to use the first comparison to identify where the simple retrieval path is weak and then run a few targeted experiments.

The most likely iteration axes are:

  • search interface: what form of output best helps the agent decide what to use
  • bundle shape: what the generator should actually see after isolation
  • ranking heuristics: table-first vs column-first vs relationship-aware boosts
  • isolation policy: how aggressively to trim tables, columns, and descriptions

This task should stay aligned with the M2 philosophy:

  • no performance tuning
  • no fancy indexing work
  • no embedding/vector detour unless the simple approach clearly fails
  • small experiments that improve usefulness for the agent

Possible Solutions

  1. Recommended: run a small post-A/B experiment matrix over interface and result-shaping Use the baseline comparison results to choose a few targeted retrieval experiments, then compare them on the same eval slice. Focus on things the agent actually experiences: ranked result format, bundle composition, and simple heuristic changes.

Why this is recommended:

  • keeps experimentation tied to observed failures
  • improves the usefulness of the retriever without changing the basic architecture
  • avoids premature optimization
  1. Keep changing the retriever ad hoc without running structured comparisons.

Trade-off: faster in the moment, but hard to learn which changes actually help.

  1. Jump immediately to a new retrieval architecture such as embeddings or a dedicated service.

Trade-off: too big for a follow-up iteration task and not aligned with the current M2 simplicity constraint.

Plan

  1. Review the first A/B comparison and identify the dominant failure modes.
  2. Pick a small experiment matrix, for example: - result list shape: terse vs richer explanations - bundle composition: table summaries only vs selected columns + relationships - ranking heuristic: exact-name heavy vs description/role boosted - isolation policy: narrow vs medium bundle sizes
  3. Run the same eval slice across those variants.
  4. Compare downstream SQL quality and note which retrieval presentation/shape helps most.
  5. Keep the winning variant if it materially helps; otherwise keep the simpler version.

Example experiment ideas

  • Search output includes only ranked IDs vs IDs plus short "why matched" text
  • Bundle keeps top 2 tables vs top 4 tables
  • Bundle keeps only top columns per table vs all columns from retained tables
  • Relationship edges included vs omitted
  • Table-name-heavy ranking vs description-heavy ranking

Explicit anti-goals

  • no speed benchmarking
  • no index optimization task
  • no large experiment wave disconnected from the first A/B evidence
  • no replacing the simple retrieval architecture during M2

Implementation Progress

Not started.

QA Exploration

  • [x] QA exploration completed (or N/A for non-UI tasks)

N/A - retrieval experiment planning/eval task.

Review Feedback

  • [ ] Review cleared