Dataface Tasks

Schema-derived AI prompts from compiled types

IDMCP_ANALYST_AGENT-SCHEMA_DERIVED_AI_PROMPTS_FROM_COMPILED_TYPES
Statusnot_started
Priorityp2
Milestonem3-public-launch
Ownerdata-ai-engineer-architect

Problem

AI prompts describing the Dataface YAML spec are hand-maintained markdown strings in schema.py. When the schema evolves (new chart types, new layout options, new variable features), the prompt templates must be manually updated — and they drift. generate_face_schema_summary() is a 70-line hand-written markdown string that's already incomplete (missing settings, geo charts, advanced layout options). json-render solved this with catalog.prompt() which auto-generates a complete system prompt from the Zod-typed catalog, so adding a component automatically makes it available to AI.

Context

  • Research origin: ai_notes/research/json-render-deep-dive.md — Priority 1 section.
  • Current hand-maintained prompts: dataface/core/compile/schema.pyget_schema_for_prompt(), generate_face_schema_summary(), generate_variable_schema()
  • Pydantic input types: dataface/core/compile/types.pyFace, Chart, Variable, QueryDefinition, etc.
  • Compiled types: dataface/core/compile/compiled_types.pyCompiledFace, CompiledChart, etc.
  • Chart type enum: ChartType in types.py
  • AI integration: dataface/ai/ — MCP server and tools that consume schema prompts
  • Dependency: declarative-schema-definition-outside-python-code — a declarative schema makes auto-generation trivial; without it, we'd introspect Pydantic models directly (possible but messier)
  • Enables: extensible-schema-with-custom-elements-and-chart-types — when users register custom elements, they automatically appear in AI prompts

Possible Solutions

Walk the Pydantic model tree (FaceChartChartType enum, etc.) at runtime, extract field names, types, defaults, docstrings, and enum values. Generate markdown prompt from the live model structure.

Pros: No new schema format needed. Works today. Always in sync with code by definition. Cons: Pydantic docstrings aren't optimized for AI context. Can't express "when to use this" hints (like json-render's description field on components). May produce verbose/noisy prompts.

B. Declarative schema with AI annotations

Once the declarative schema exists (see dependency), add AI-specific metadata: component descriptions, usage hints, common mistakes, examples. Generate prompts from this enriched schema.

Pros: Rich, curated AI context. Descriptions tuned for LLM consumption. Same schema powers validation, editor tooling, and AI prompts. Cons: Blocked on declarative schema work. Requires maintaining description quality.

C. Hybrid: Pydantic introspection now, migrate to declarative later

Start with (A) to eliminate hand-maintenance immediately. When the declarative schema lands, switch the prompt generator to read from it instead. The public API (get_schema_for_prompt()) stays the same.

Pros: Immediate improvement. Clean migration path. No throwaway work. Cons: Two rounds of implementation.

Plan

  1. Identify which parts of the current AI prompt can be generated directly from compiled types and which parts still need curated explanation.
  2. Implement a derivation path that produces the structural schema guidance in a stable prompt-friendly format.
  3. Integrate the derived prompt content into the existing prompt assembly flow with tests that catch schema/prompt drift.
  4. Review the resulting prompt quality on representative agent tasks and refine the layering between generated and hand-authored guidance.

Implementation Progress

Review Feedback

  • [ ] Review cleared