Dataface Tasks

Generate full Vega-Lite Pydantic chart layer from vendored schema

IDDFT_CORE-GENERATE_EXHAUSTIVE_VEGA_LITE_PASSTHROUGH_CONTRACT_FROM_VENDORED_SCHEMA
Statuscompleted
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownersr-engineer-architect
Completed bydave
Completed2026-03-27

Problem

Generate the full Vega-Lite chart contract as a fully defined Pydantic compatibility layer from the vendored schema, then derive schema/catalog artifacts and connect it to the Dataface profile mapping layer.

Context

Background research: - ai_notes/refactors/DATAFACE_SPEC_PROFILE_AND_VEGA_LITE_STRATEGY.md - ai_notes/research/DATAFACE_AS_A_DASHBOARD_SPEC_FROM_VEGA_LITE_OUTWARD.md - ai_notes/refactors/CHART_DEFAULTS_AND_PRECEDENCE_AUDIT.md - ai_notes/refactors/COMPILED_PRESENTATION_DEFAULTS_ARCHITECTURE.md

Current architecture stance: - Dataface should have a fully defined Pydantic language layer - the chart half of that layer should start by mirroring the full Vega-Lite chart surface - Dataface product opinion should be layered on top through: - shorthand - defaults/theme/structure policy - a profile-mapping layer for renames, wraps, hiding, and divergence logic - the generated Vega-Lite compatibility layer is not the whole product language; it is the exhaustive chart-grammar substrate that Dataface compiles through

This task should stop framing the work as vague passthrough. The goal is stronger: - build the full typed Vega-Lite chart contract - make it available to Python validation, schema tooling, docs inputs, and AI context - derive supporting artifacts from the same generated contract

Repeated AI-assisted attempts tend to hand-map just a few Vega-Lite surfaces instead of solving the general problem. This task exists to make the chart mirror exhaustive and mechanical.

Possible Solutions

  1. Continue hand-adding small Vega-Lite model slices as needed. Easy short term, but it guarantees endless incompleteness.
  2. Recommended: generate the full Vega-Lite chart contract into committed Pydantic model files from the vendored upstream schema, then derive JSON Schema/catalog artifacts from that generated layer. This makes the chart surface fully typed while keeping maintenance mechanical.
  3. Mirror the entire upstream schema by hand in Pydantic. Too much maintenance and too error-prone.
  4. Leave Vega-Lite surfaces intentionally partial forever. Not acceptable for the comprehensive spec Dataface wants to become.

Plan

  1. Audit the vendored Vega-Lite schema and define the exact chart-spec families Dataface will mirror.
  2. Generate committed Pydantic model files for the full Vega-Lite chart contract.
  3. Derive JSON Schema and a structural path/catalog artifact from the same generated contract.
  4. Connect the generated compatibility layer to the Dataface chart profile/mapping layer rather than letting renderer code consume raw ad hoc dicts.
  5. Keep Dataface-owned dashboard concepts (variables, queries, tables, layout, defaults policy) outside the generated Vega-Lite layer.
  6. Add parity tests so vendored-schema changes or stale generated artifacts fail loudly.

Implementation Progress

  • Task scope updated after architecture clarification:
  • target is a fully defined Pydantic chart layer
  • generated Vega-Lite Pydantic models are the implementation strategy for the mirrored chart surface
  • supporting schema/catalog artifacts should be derived from that generated layer where practical
  • Dataface policy/divergence stays in the profile-mapping layer and defaults system, not in handwritten partial Vega-Lite mirrors

  • Generator script: scripts/generate_vega_lite_models.py

  • Reads vendored vega-lite-v5.schema.json.gz (456 definitions)
  • Converts JSON Schema definitions to Pydantic models mechanically
  • Handles: object types → BaseModel, string enums → Literal, $ref → type alias, anyOf/oneOf → Union
  • Topological sort with cycle-breaking ensures valid definition order
  • Sanitizes names (angle brackets, Python keywords, import collisions like Field/Type/Dict)
  • Emits: enums → classes → complex aliases (classes use deferred annotations, aliases execute eagerly)
  • Deterministic output (md5-based disambiguation, sorted topo ties)
  • All models use extra="allow" for forward-compatible Vega-Lite passthrough

  • Generated output: dataface/core/compile/vega_lite/_generated.py (~168KB)

  • 247 Pydantic model classes (MarkDef, Encoding, Config, Scale, Axis, Legend, etc.)
  • 197 type aliases (enums, unions, refs)
  • 456 total definitions covering the full Vega-Lite v5 chart surface
  • VegaLiteBaseModel base class with extra="allow" + populate_by_name=True

  • Package: dataface/core/compile/vega_lite/

  • __init__.py: curated re-exports of key chart-grammar types
  • _generated.py: full generated surface
  • catalog.py: structural catalog derivation (list_models, list_encoding_channels, list_mark_properties, list_config_sections, get_model_json_schema)

  • Parity tests: tests/core/test_vega_lite_generated.py (12 tests)

  • Staleness: re-generating must produce identical output
  • Structure: definition count matches, all models have extra="allow", key chart types exist
  • Property coverage: MarkDef, Encoding, Config spot-checked against expected fields
  • Usability: construction, extra-field passthrough, dict roundtrip, JSON Schema derivation

  • Connection to existing layer:

  • chart_settings_types.py docstring updated to reference generated Axis/Scale contracts
  • Axis/scale fields remain dict[str, Any] for backward compat (profile layer does dict manipulation)
  • Generated types available for author-time validation and incremental migration

QA Exploration

  • [x] N/A — non-UI task (generated Pydantic models and schema tooling, no browser interaction)

Review Feedback

  • [ ] Review cleared