Experiment: Layer scope all vs gold-only vs gold+silver

ID	MCP_ANALYST_AGENT-EXPERIMENT_LAYER_SCOPE_ALL_VS_GOLD_ONLY_VS_GOLD_SILVER
Status	not_started
Priority	p2
Milestone	m2-internal-adoption-design-partners
Owner	data-ai-engineer-architect
Initiative	ai-quality-experimentation-and-context-optimization

Hypothesis

Reducing scope from all tables to gold-only or gold-plus-silver should reduce noise and hallucinated table usage. Gold-only may be the cleanest setting overall, but gold-plus-silver might preserve enough breadth for edge cases.

Method

Use the winning model and the chosen schema/tool configuration from the earlier waves, then vary only the exposed table scope across all, gold-only, and gold+silver. Run this together with the schema curation task so the results can directly inform which tables should be surfaced in the benchmark and catalog.

Variables

Variable	Values
Independent	table scope (`all`, `gold-only`, `gold+silver`)
Dependent	pass rate, grounding failures, hallucinated tables, latency, token cost
Controlled	model, prompt, canary subset, schema/tool strategy, scorer, seed, temperature

Execution Log

Run 1: all tables

Command: <exact command>
Config: fixed model, chosen schema/tool strategy, all tables visible
Output: <path to results JSONL>
Started: <timestamp>
Duration: <time>
Notes: <anything unexpected>

Run 2: gold-only

Command: <exact command>
Config: fixed model, chosen schema/tool strategy, gold-only tables visible
Output: <path to results JSONL>
Started: <timestamp>
Duration: <time>
Notes: <anything unexpected>

Run 3: gold+silver

Command: <exact command>
Config: fixed model, chosen schema/tool strategy, gold+silver tables visible
Output: <path to results JSONL>
Started: <timestamp>
Duration: <time>
Notes: <anything unexpected>

Results

Condition	Pass Rate	Avg Score	Parse	Grounding	Intent	Latency	Tokens
all
gold-only
gold+silver