Chart evaluation ownership and improvement
Problem
The chart library already has some evaluation tooling available, but chart-specific ownership of how that tooling should be used, interpreted, and improved is still undefined. If that remains vague, the tooling will stay peripheral instead of becoming part of the chart-library workflow. This work should focus on adopting the existing evaluation tooling, learning how well it serves chart review, and making the first targeted improvements needed for chart-library quality review and regression detection.
Context
- This workstream already has access to A lIe evaluation tooling, which is a good fit for comparative chart-output review.
- This work belongs after the basic chart batch exists, because evaluation becomes materially more useful once there is a stable chart corpus to judge.
- Evaluation becomes more valuable once it runs against a fixed chart corpus and both style packages.
Possible Solutions
- Manual-only review. Lowest setup cost, but brittle and hard to compare over time.
- Treat the existing tooling as someone else's system and defer ownership. Easy in the short term, but it keeps evaluation outside the real chart-library workflow.
- Recommended: adopt the existing evaluation tooling for the chart corpus, learn where it works well or poorly for chart review, and make the first improvements that bring it into regular chart-library use.
Plan
- Define the artifact set to evaluate: core chart batch x style packages x representative examples.
- Run the existing evaluation tooling on that corpus as a normal part of chart work.
- Capture what signals are useful now, what gaps make chart review awkward, and what improvements would have the highest leverage.
- Make the first improvements needed to support chart-library quality review and regression detection.
- Use the findings to guide later style-package and design-assertion work rather than treating evaluation as a separate silo.
Implementation Progress
- Evaluation should begin after the chart batch and dual style packages exist in usable form.
- The goal is not to invent new evaluation infrastructure from scratch; it is to adopt, own, and improve what already exists for chart-library use.
QA Exploration
- [ ] QA exploration completed (or N/A for non-UI tasks)
Review Feedback
- [ ] Review cleared