Dataface Tasks

Experiment: Schema tool strategy profiled vs filtered vs INFORMATION_SCHEMA vs none

IDMCP_ANALYST_AGENT-EXPERIMENT_SCHEMA_TOOL_STRATEGY_PROFILED_VS_FILTERED_VS_INFORMATION_SCHEMA_VS_NONE
Statusnot_started
Priorityp2
Milestonem2-internal-adoption-design-partners
Ownerdata-ai-engineer-architect
Initiativeai-quality-experimentation-and-context-optimization

Hypothesis

The profiled catalog strategy should be strongest overall, filtered fields should be close behind, and INFORMATION_SCHEMA or no tool should trail unless freshness matters more than richness. This experiment should tell us whether the extra profiling cost is justified.

Method

Use the winning model and the winning context level from the earlier waves, then vary only the schema acquisition path across profiled catalog, filtered catalog, live INFORMATION_SCHEMA, and no tool. Keep the prompt and benchmark subset fixed so the result measures how schema access strategy affects SQL generation.

Variables

Variable Values
Independent schema acquisition path (profiled, filtered, information_schema, none)
Dependent pass rate, grounding failures, tool-call rate, latency, token cost
Controlled model, prompt, canary subset, context level, scorer, seed, temperature

Execution Log

Run 1: profiled catalog

  • Command: <exact command>
  • Config: winning model, chosen context level, profiled catalog path
  • Output: <path to results JSONL>
  • Started: <timestamp>
  • Duration: <time>
  • Notes: <anything unexpected>

Run 2: filtered catalog

  • Command: <exact command>
  • Config: winning model, chosen context level, filtered catalog path
  • Output: <path to results JSONL>
  • Started: <timestamp>
  • Duration: <time>
  • Notes: <anything unexpected>

Run 3: INFORMATION_SCHEMA

  • Command: <exact command>
  • Config: winning model, chosen context level, live INFORMATION_SCHEMA
  • Output: <path to results JSONL>
  • Started: <timestamp>
  • Duration: <time>
  • Notes: <anything unexpected>

Run 4: no tool

  • Command: <exact command>
  • Config: winning model, chosen context level, no schema tool
  • Output: <path to results JSONL>
  • Started: <timestamp>
  • Duration: <time>
  • Notes: <anything unexpected>

Results

Condition Pass Rate Avg Score Parse Grounding Intent Latency Tokens
profiled
filtered
information_schema
none

Breakdowns

Analysis

What do the results tell you? Was the hypothesis confirmed?

Conclusion

What's the decision? What changes, if any, should go to production?

Follow-up experiments