Add Spider 2.0 Lite adapter and runner support

ID	MCP_ANALYST_AGENT-ADD_SPIDER_2_LITE_ADAPTER_AND_RUNNER_SUPPORT
Status	not_started
Priority	p1
Milestone	m2-internal-adoption-design-partners
Owner	data-ai-engineer-architect
Initiative	external-text-to-sql-benchmarks-and-sota-calibration

Problem

Integrate Spider 2.0-Lite into apps/evals with benchmark-aware loading, environment assumptions, and reproducible baseline runs.

Spider 2.0-Lite is a good enterprise-style calibration target because it stresses long context, larger schemas, and more realistic operational complexity than classic Spider.
It is also meaningfully harder to integrate than BIRD because benchmark settings and environment assumptions vary more.
We should treat this as the first "harder" external adapter after the shared contract is defined, not as the first benchmark integration.

Recommended: integrate Spider 2.0-Lite as a benchmark-aware adapter that fits the shared external contract but preserves its benchmark-specific settings in metadata. This keeps the runner shared while making it obvious which local assumptions differ from more official or hosted evaluation settings.

Why this is recommended:

Trade-off: more ambitious, but it blocks useful progress on environment and provenance work.

Trade-off: easy, but it hides exactly the benchmark semantics this task is meant to preserve.

Add a Spider 2.0-Lite loader that emits the shared external case contract plus Spider-specific metadata.
Define any benchmark-specific execution/scoring caveats that affect comparability.
Add a reproducible smoke baseline and durable artifacts for the benchmark.
Add tests that verify: - benchmark cases load cleanly - provenance survives into output artifacts - dashboard queries can slice by Spider benchmark metadata
Document why this adapter covers Lite first and defers Snow, DBT, and more hosted variants.

N/A - implementation task, but not a browser-flow task.