Evals
Browse eval dashboards and raw run artifacts through the tasks server.
Top-line eval dashboard
Compare SQL runs
Raw run files and directories
Latest runs
| Run | Backend | Model | Provider | Context | Cases | Pass rate | Artifacts |
|---|---|---|---|---|---|---|---|
sql/20260325_043540_smoke_local |
Placement
Put generated eval run outputs under apps/evals/runs/<family>/<run_id>/. The tasks server picks them up automatically for the landing page and exposes them under /evals/artifacts/. Dataface eval faces are served under /evals/faces/.
Current assumptions: eval dashboards are the existing Dataface project in apps/evals/, and most raw artifacts are JSON/JSONL files plus any generated static files inside each run directory.