Experiment: Catalog tool access with vs without tool

ID	MCP_ANALYST_AGENT-EXPERIMENT_CATALOG_TOOL_ACCESS_WITH_VS_WITHOUT_TOOL
Status	not_started
Priority	p2
Milestone	m2-internal-adoption-design-partners
Owner	data-ai-engineer-architect
Initiative	ai-quality-experimentation-and-context-optimization

Hypothesis

Live catalog access will help the model on ambiguous or underspecified questions, but the gains may be smaller than the cost in latency and tool complexity once the preloaded context is already strong.

Method

Hold the model, prompt, benchmark subset, and preloaded schema payload fixed. Compare runs with the catalog tool enabled versus disabled so the experiment isolates tool availability rather than the schema fields themselves. Use the same scorer and logging format as the other experiments.

Variables

Variable	Values
Independent	catalog tool access (`enabled` vs `disabled`)
Dependent	pass rate, grounding failures, tool-call count, latency, token cost
Controlled	model, prompt, canary subset, schema payload, scorer, seed, temperature

Execution Log

Run 1: catalog tool enabled

Command: <exact command>
Config: fixed model, fixed schema payload, catalog tool enabled
Output: <path to results JSONL>
Started: <timestamp>
Duration: <time>
Notes: <anything unexpected>

Run 2: catalog tool disabled

Command: <exact command>
Config: fixed model, fixed schema payload, catalog tool disabled
Output: <path to results JSONL>
Started: <timestamp>
Duration: <time>
Notes: <anything unexpected>

Results

Condition	Pass Rate	Avg Score	Parse	Grounding	Intent	Latency	Tokens
enabled
disabled

Experiment: Catalog tool access with vs without tool

Hypothesis

Method

Variables

Execution Log

Run 1: catalog tool enabled

Run 2: catalog tool disabled

Results

Breakdowns

Analysis

Conclusion

Follow-up experiments