Add LiveSQLBench adapter and release-tracking workflow
Problem
Add a second-wave integration for LiveSQLBench with explicit handling for release versions, hidden-vs-open splits, and evolving benchmark context.
Context
LiveSQLBenchis appealing because it is newer, more dynamic, and more contamination-aware than older static benchmarks.- That same dynamism adds versioning and release-management complexity that makes it a poor first integration target.
- This task exists so the second-wave work is already scoped and ordered instead of becoming a vague "maybe later" benchmark bucket.
Possible Solutions
- Recommended: add
LiveSQLBenchonly after the shared contract and first adapters exist, with explicit release/version tracking. Treat it as a release-aware benchmark family where artifact provenance records exactly which release and split were used.
Why this is recommended:
- fits the strengths of LiveSQLBench
- avoids mixing dynamic benchmark semantics into the first adapter wave
- gives a clean path to evolve with new releases
- Fold LiveSQLBench into the first implementation wave.
Trade-off: maximizes ambition, but adds too much benchmark-management complexity before the shared contract is proven.
- Snapshot one release and treat it as static forever.
Trade-off: simplest operationally, but loses much of the benchmark's value.
Plan
- Define how benchmark release/version identifiers appear in normalized case metadata and run provenance.
- Decide which LiveSQLBench slice is practical for local development and what should remain deferred.
- Add a loader and run mode that pins a specific release instead of silently drifting.
- Add dashboard slices that distinguish release-to-release movement from model movement.
- Document how new releases should be introduced without contaminating prior comparisons.
Implementation Progress
QA Exploration
- [x] QA exploration completed (or N/A for non-UI tasks)
N/A - implementation task, but not a browser-flow task.
Review Feedback
- [ ] Review cleared