Add dashboard review-and-revise workflow
Problem
Define and pilot a second-pass dashboard review workflow that inspects rendered dashboards with real data, captures review heuristics in a markdown playbook, and feeds concrete revisions back into dashboard generation/update steps.
Context
- Existing tools:
render_dashboard,execute_query,catalog,list_sources,search_dashboards— all defined canonically indataface/ai/tool_schemas.py, implemented indataface/ai/mcp/tools.py, exposed via MCP (mcp/server.py) and OpenAI function-calling (tools.py). - Design heuristics exist in
dataface/ai/skills/dataface-dashboard-design/SKILL.md— quality checklist, common mistakes, chart selection guide. These are human-readable but not machine-checkable today. - A Lie eval pipeline (
apps/a_lie/run_evals.py,review_evals.py) does vision-based scoring againsteval_rubric.mdbut has no revision loop — it evaluates but never feeds back. - A Lie generation (
apps/a_lie/ai_service.py) uses OpenAI Responses API withrender_dashboardtool. No review or revision step exists. - Prompt loading:
dataface/ai/prompts.pyloads shared skills via_SKILL_NAME_MAP→SKILLS_DIR/<dir>/SKILL.md. - No
review_dashboardtool exists anywhere in the codebase.
Possible Solutions
Option A: Vision-based review only (extend eval pipeline)
Use the existing screenshot + vision LLM approach from review_evals.py as the review tool. Trade-off: requires screenshots, slow (30s+ per review), depends on external LLM, not available in MCP/tool-calling context.
Option B: Recommended — Shared structural review_dashboard tool + A Lie orchestration
Add a first-class review_dashboard tool to the shared AI layer that performs structural/semantic analysis of compiled dashboard documents against design heuristics. Returns machine-readable findings (issues, suggestions). Expose via MCP + OpenAI tool surfaces. Keep revision orchestration in A Lie client logic so critique stays inspectable/testable. A convenience revise_dashboard wrapper can come later.
Option C: Prompt-only approach
Add review instructions to the system prompt and hope the LLM self-reviews. Trade-off: not inspectable, not testable, no structured output.
Plan
Selected: Option B — shared structural review_dashboard primitive.
Files to modify/create:
dataface/ai/mcp/tools.py— addreview_dashboard()implementationdataface/ai/tool_schemas.py— addREVIEW_DASHBOARDschema, add toALL_TOOLSdataface/ai/mcp/server.py— wire into MCP tool handlersdataface/ai/tools.py— wire into OpenAI tool surfacedataface/ai/prompts.py— add_SKILL_NAME_MAPentry for"dashboard_review"dataface/ai/skills/reviewing-dataface-dashboards/SKILL.md— shared review skillapps/a_lie/ai_service.py— wire review+revise orchestration after initial rendertests/core/test_mcp.py— addTestReviewDashboardtests
Implementation steps:
- Write failing tests for
review_dashboardtool - Implement
review_dashboard()inmcp/tools.py— structural analysis of compiled Face - Add canonical schema in
tool_schemas.py - Wire into MCP server + OpenAI tools
- Create shared review skill
- Wire A Lie to do generate → render → review → optionally revise once → re-render
- Run tests, validate task file
Review heuristics (structural checks):
- Chart count (warn if >8, suggest split)
- KPI row present and positioned first
- Pie/donut with >3 categories
- Missing descriptions on queries/charts/variables
- Missing format specs on numeric columns
- Generic titles
- Layout depth/complexity
- Every chart references a valid query
Implementation Progress
Files changed (7 modified, 2 new):
Modified:
1. dataface/ai/mcp/tools.py + dataface/ai/mcp/review.py — added review_dashboard() and extracted the heuristic checks into a dedicated review module. Compiles YAML, checks 5 heuristics (chart_count, missing_description, pie_donut_categories, kpi_position, generic_title), and returns structured {success, findings, summary, chart_count, errors}.
2. dataface/ai/tool_schemas.py — added REVIEW_DASHBOARD canonical schema dict, appended to ALL_TOOLS.
3. dataface/ai/mcp/server.py — imported REVIEW_DASHBOARD schema + review_dashboard impl; added to MCP tool list and call-handler switch.
4. dataface/ai/tools.py — added TOOL_REVIEW_DASHBOARD OpenAI wrapper, added to get_tools(), added dispatch in handle_tool_call.
5. dataface/ai/prompts.py — added "dashboard_review": "reviewing-dataface-dashboards" to _SKILL_NAME_MAP.
6. apps/a_lie/ai_service.py — added _review_and_revise() generator method. After initial render yields YAML, calls review_dashboard locally; if warnings found, sends findings to LLM for one revision pass with lower temperature (0.7).
7. tasks/.../add-dashboard-review-and-revise-workflow.md — filled Context, Possible Solutions, Plan, Implementation Progress.
New:
8. dataface/ai/skills/reviewing-dataface-dashboards/SKILL.md — shared review skill with workflow, heuristic table, revision patterns, rationalizations-to-resist.
9. tests/core/test_review_dashboard.py — 15 tests across 5 classes: basics (validation), chart_count, missing_descriptions, pie_donut_categories, kpi_position, tool schema registration, OpenAI dispatch.
Test results:
tests/core/test_review_dashboard.py— 18/18 passed (basics, chart_count, missing_descriptions, pie_donut, kpi_position, generic_title, helpers, schema registration)tests/core/test_ai_tools.py— 11/11 passed (tool definitions, get_tools, descriptions — includes review_dashboard)apps/a_lie/tests/test_ai_service.py— 7/7 passed (prompt includes review guide, warning findings trigger revision, info-only skipped, error propagation, revision content merged, tool-call round-trip, malformed args)tests/core/test_mcp.py— 26/26 passed (no regressions)- Black, Ruff — clean on all changed files.
Review Feedback
- [ ] Review cleared