Dataface Tasks

Add schema-linking supervision to text-to-SQL benchmark cases

IDMCP_ANALYST_AGENT-ADD_SCHEMA_LINKING_SUPERVISION_TO_TEXT_TO_SQL_BENCHMARK_CASES
Statusnot_started
Priorityp2
Milestonem5-v1-2-launch
Ownerdata-ai-engineer-architect

Problem

Retrieval-side gold labels tell us whether the right tables and columns were present overall, but they still do not show how the question maps onto those schema elements. For planning and retrieval diagnostics, we eventually need a finer-grained view of which parts of the question should connect to which tables, columns, metrics, and filters.

Without schema-linking supervision, it remains difficult to evaluate whether a planner or retriever understood the user's question structure or merely happened to surface the right objects.

Context

This task builds on future benchmark enrichment such as retrieval-side gold labels. It is a deeper annotation layer that is most useful once the team already has:

  • benchmark cases with stable semantic targets
  • planning outputs worth comparing
  • retrieval/bundle metrics that need more diagnostic power

The supervision does not need to be perfect or exhaustive for every token. It needs to be useful enough to evaluate schema-linking quality on representative cases.

Possible Solutions

  1. Recommended: add lightweight schema-linking annotations for a representative benchmark slice. Annotate key question spans to their intended tables, columns, metrics, and filters so planning and retrieval components can be scored more directly.

Why this is recommended:

  • creates a clear supervision target for schema linking
  • helps explain where planning or retrieval misunderstood the question
  • can be introduced incrementally on a high-value slice
  1. Infer schema linking only from final SQL.

Trade-off: cheaper, but too indirect for diagnosing retrieval and planning behavior.

  1. Try to annotate the entire benchmark exhaustively before using it.

Trade-off: too expensive. A representative slice is a better starting point.

Plan

  1. Define a minimal schema-linking annotation contract for benchmark cases.
  2. Pick a representative slice where schema-linking quality matters most.
  3. Annotate key spans for: - tables - columns - metric concepts - filter values or dimensions
  4. Add helper scoring utilities that compare planner/retriever outputs to those annotations.
  5. Use the slice for deeper diagnostics on retrieval and planning experiments.

Success criteria

  • at least one benchmark slice contains usable schema-linking supervision
  • retrieval and planning components can be scored more directly than final SQL alone allows
  • later semantic and planner work has a clearer supervision target

Implementation Progress

Not started.

QA Exploration

  • [x] QA exploration completed (or N/A for non-UI tasks)

N/A - benchmark annotation task.

Review Feedback

No review feedback yet.

  • [ ] Review cleared