Dataface Tasks

Experiment: Catalog tool access with vs without tool

IDMCP_ANALYST_AGENT-EXPERIMENT_CATALOG_TOOL_ACCESS_WITH_VS_WITHOUT_TOOL
Statusnot_started
Priorityp2
Milestonem2-internal-adoption-design-partners
Ownerdata-ai-engineer-architect
Initiativeai-quality-experimentation-and-context-optimization

Hypothesis

Live catalog access will help the model on ambiguous or underspecified questions, but the gains may be smaller than the cost in latency and tool complexity once the preloaded context is already strong.

Method

Hold the model, prompt, benchmark subset, and preloaded schema payload fixed. Compare runs with the catalog tool enabled versus disabled so the experiment isolates tool availability rather than the schema fields themselves. Use the same scorer and logging format as the other experiments.

Variables

Variable Values
Independent catalog tool access (enabled vs disabled)
Dependent pass rate, grounding failures, tool-call count, latency, token cost
Controlled model, prompt, canary subset, schema payload, scorer, seed, temperature

Execution Log

Run 1: catalog tool enabled

  • Command: <exact command>
  • Config: fixed model, fixed schema payload, catalog tool enabled
  • Output: <path to results JSONL>
  • Started: <timestamp>
  • Duration: <time>
  • Notes: <anything unexpected>

Run 2: catalog tool disabled

  • Command: <exact command>
  • Config: fixed model, fixed schema payload, catalog tool disabled
  • Output: <path to results JSONL>
  • Started: <timestamp>
  • Duration: <time>
  • Notes: <anything unexpected>

Results

Condition Pass Rate Avg Score Parse Grounding Intent Latency Tokens
enabled
disabled

Breakdowns

Analysis

What do the results tell you? Was the hypothesis confirmed?

Conclusion

What's the decision? What changes, if any, should go to production?

Follow-up experiments