Dataface Tasks

Add dashboard review-and-revise workflow

IDDASHBOARD_FACTORY-ADD_DASHBOARD_REVIEW_AND_REVISE_WORKFLOW
Statuscompleted
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownerdata-analysis-evangelist-ai-training

Problem

Define and pilot a second-pass dashboard review workflow that inspects rendered dashboards with real data, captures review heuristics in a markdown playbook, and feeds concrete revisions back into dashboard generation/update steps.

Context

  • Existing tools: render_dashboard, execute_query, catalog, list_sources, search_dashboards — all defined canonically in dataface/ai/tool_schemas.py, implemented in dataface/ai/mcp/tools.py, exposed via MCP (mcp/server.py) and OpenAI function-calling (tools.py).
  • Design heuristics exist in dataface/ai/skills/dataface-dashboard-design/SKILL.md — quality checklist, common mistakes, chart selection guide. These are human-readable but not machine-checkable today.
  • A Lie eval pipeline (apps/a_lie/run_evals.py, review_evals.py) does vision-based scoring against eval_rubric.md but has no revision loop — it evaluates but never feeds back.
  • A Lie generation (apps/a_lie/ai_service.py) uses OpenAI Responses API with render_dashboard tool. No review or revision step exists.
  • Prompt loading: dataface/ai/prompts.py loads shared skills via _SKILL_NAME_MAPSKILLS_DIR/<dir>/SKILL.md.
  • No review_dashboard tool exists anywhere in the codebase.

Possible Solutions

Option A: Vision-based review only (extend eval pipeline)

Use the existing screenshot + vision LLM approach from review_evals.py as the review tool. Trade-off: requires screenshots, slow (30s+ per review), depends on external LLM, not available in MCP/tool-calling context.

Add a first-class review_dashboard tool to the shared AI layer that performs structural/semantic analysis of compiled dashboard documents against design heuristics. Returns machine-readable findings (issues, suggestions). Expose via MCP + OpenAI tool surfaces. Keep revision orchestration in A Lie client logic so critique stays inspectable/testable. A convenience revise_dashboard wrapper can come later.

Option C: Prompt-only approach

Add review instructions to the system prompt and hope the LLM self-reviews. Trade-off: not inspectable, not testable, no structured output.

Plan

Selected: Option B — shared structural review_dashboard primitive.

Files to modify/create:

  1. dataface/ai/mcp/tools.py — add review_dashboard() implementation
  2. dataface/ai/tool_schemas.py — add REVIEW_DASHBOARD schema, add to ALL_TOOLS
  3. dataface/ai/mcp/server.py — wire into MCP tool handlers
  4. dataface/ai/tools.py — wire into OpenAI tool surface
  5. dataface/ai/prompts.py — add _SKILL_NAME_MAP entry for "dashboard_review"
  6. dataface/ai/skills/reviewing-dataface-dashboards/SKILL.md — shared review skill
  7. apps/a_lie/ai_service.py — wire review+revise orchestration after initial render
  8. tests/core/test_mcp.py — add TestReviewDashboard tests

Implementation steps:

  1. Write failing tests for review_dashboard tool
  2. Implement review_dashboard() in mcp/tools.py — structural analysis of compiled Face
  3. Add canonical schema in tool_schemas.py
  4. Wire into MCP server + OpenAI tools
  5. Create shared review skill
  6. Wire A Lie to do generate → render → review → optionally revise once → re-render
  7. Run tests, validate task file

Review heuristics (structural checks):

  • Chart count (warn if >8, suggest split)
  • KPI row present and positioned first
  • Pie/donut with >3 categories
  • Missing descriptions on queries/charts/variables
  • Missing format specs on numeric columns
  • Generic titles
  • Layout depth/complexity
  • Every chart references a valid query

Implementation Progress

Files changed (7 modified, 2 new):

Modified: 1. dataface/ai/mcp/tools.py + dataface/ai/mcp/review.py — added review_dashboard() and extracted the heuristic checks into a dedicated review module. Compiles YAML, checks 5 heuristics (chart_count, missing_description, pie_donut_categories, kpi_position, generic_title), and returns structured {success, findings, summary, chart_count, errors}. 2. dataface/ai/tool_schemas.py — added REVIEW_DASHBOARD canonical schema dict, appended to ALL_TOOLS. 3. dataface/ai/mcp/server.py — imported REVIEW_DASHBOARD schema + review_dashboard impl; added to MCP tool list and call-handler switch. 4. dataface/ai/tools.py — added TOOL_REVIEW_DASHBOARD OpenAI wrapper, added to get_tools(), added dispatch in handle_tool_call. 5. dataface/ai/prompts.py — added "dashboard_review": "reviewing-dataface-dashboards" to _SKILL_NAME_MAP. 6. apps/a_lie/ai_service.py — added _review_and_revise() generator method. After initial render yields YAML, calls review_dashboard locally; if warnings found, sends findings to LLM for one revision pass with lower temperature (0.7). 7. tasks/.../add-dashboard-review-and-revise-workflow.md — filled Context, Possible Solutions, Plan, Implementation Progress.

New: 8. dataface/ai/skills/reviewing-dataface-dashboards/SKILL.md — shared review skill with workflow, heuristic table, revision patterns, rationalizations-to-resist. 9. tests/core/test_review_dashboard.py — 15 tests across 5 classes: basics (validation), chart_count, missing_descriptions, pie_donut_categories, kpi_position, tool schema registration, OpenAI dispatch.

Test results:

  • tests/core/test_review_dashboard.py — 18/18 passed (basics, chart_count, missing_descriptions, pie_donut, kpi_position, generic_title, helpers, schema registration)
  • tests/core/test_ai_tools.py — 11/11 passed (tool definitions, get_tools, descriptions — includes review_dashboard)
  • apps/a_lie/tests/test_ai_service.py — 7/7 passed (prompt includes review guide, warning findings trigger revision, info-only skipped, error propagation, revision content merged, tool-call round-trip, malformed args)
  • tests/core/test_mcp.py — 26/26 passed (no regressions)
  • Black, Ruff — clean on all changed files.

Review Feedback

  • [ ] Review cleared