Experiment: Schema tool strategy profiled vs filtered vs INFORMATION_SCHEMA vs none
Hypothesis
The profiled catalog strategy should be strongest overall, filtered fields should be close behind, and INFORMATION_SCHEMA or no tool should trail unless freshness matters more than richness. This experiment should tell us whether the extra profiling cost is justified.
Method
Use the winning model and the winning context level from the earlier waves, then vary only the schema acquisition path across profiled catalog, filtered catalog, live INFORMATION_SCHEMA, and no tool. Keep the prompt and benchmark subset fixed so the result measures how schema access strategy affects SQL generation.
Variables
| Variable | Values |
|---|---|
| Independent | schema acquisition path (profiled, filtered, information_schema, none) |
| Dependent | pass rate, grounding failures, tool-call rate, latency, token cost |
| Controlled | model, prompt, canary subset, context level, scorer, seed, temperature |
Execution Log
Run 1: profiled catalog
- Command:
<exact command> - Config: winning model, chosen context level, profiled catalog path
- Output:
<path to results JSONL> - Started:
<timestamp> - Duration:
<time> - Notes:
<anything unexpected>
Run 2: filtered catalog
- Command:
<exact command> - Config: winning model, chosen context level, filtered catalog path
- Output:
<path to results JSONL> - Started:
<timestamp> - Duration:
<time> - Notes:
<anything unexpected>
Run 3: INFORMATION_SCHEMA
- Command:
<exact command> - Config: winning model, chosen context level, live
INFORMATION_SCHEMA - Output:
<path to results JSONL> - Started:
<timestamp> - Duration:
<time> - Notes:
<anything unexpected>
Run 4: no tool
- Command:
<exact command> - Config: winning model, chosen context level, no schema tool
- Output:
<path to results JSONL> - Started:
<timestamp> - Duration:
<time> - Notes:
<anything unexpected>
Results
| Condition | Pass Rate | Avg Score | Parse | Grounding | Intent | Latency | Tokens |
|---|---|---|---|---|---|---|---|
| profiled | |||||||
| filtered | |||||||
| information_schema | |||||||
| none |
Breakdowns
Analysis
What do the results tell you? Was the hypothesis confirmed?
Conclusion
What's the decision? What changes, if any, should go to production?