Generate full Vega-Lite Pydantic chart layer from vendored schema
Problem
Generate the full Vega-Lite chart contract as a fully defined Pydantic compatibility layer from the vendored schema, then derive schema/catalog artifacts and connect it to the Dataface profile mapping layer.
Context
Background research:
- ai_notes/refactors/DATAFACE_SPEC_PROFILE_AND_VEGA_LITE_STRATEGY.md
- ai_notes/research/DATAFACE_AS_A_DASHBOARD_SPEC_FROM_VEGA_LITE_OUTWARD.md
- ai_notes/refactors/CHART_DEFAULTS_AND_PRECEDENCE_AUDIT.md
- ai_notes/refactors/COMPILED_PRESENTATION_DEFAULTS_ARCHITECTURE.md
Current architecture stance: - Dataface should have a fully defined Pydantic language layer - the chart half of that layer should start by mirroring the full Vega-Lite chart surface - Dataface product opinion should be layered on top through: - shorthand - defaults/theme/structure policy - a profile-mapping layer for renames, wraps, hiding, and divergence logic - the generated Vega-Lite compatibility layer is not the whole product language; it is the exhaustive chart-grammar substrate that Dataface compiles through
This task should stop framing the work as vague passthrough. The goal is stronger: - build the full typed Vega-Lite chart contract - make it available to Python validation, schema tooling, docs inputs, and AI context - derive supporting artifacts from the same generated contract
Repeated AI-assisted attempts tend to hand-map just a few Vega-Lite surfaces instead of solving the general problem. This task exists to make the chart mirror exhaustive and mechanical.
Possible Solutions
- Continue hand-adding small Vega-Lite model slices as needed. Easy short term, but it guarantees endless incompleteness.
- Recommended: generate the full Vega-Lite chart contract into committed Pydantic model files from the vendored upstream schema, then derive JSON Schema/catalog artifacts from that generated layer. This makes the chart surface fully typed while keeping maintenance mechanical.
- Mirror the entire upstream schema by hand in Pydantic. Too much maintenance and too error-prone.
- Leave Vega-Lite surfaces intentionally partial forever. Not acceptable for the comprehensive spec Dataface wants to become.
Plan
- Audit the vendored Vega-Lite schema and define the exact chart-spec families Dataface will mirror.
- Generate committed Pydantic model files for the full Vega-Lite chart contract.
- Derive JSON Schema and a structural path/catalog artifact from the same generated contract.
- Connect the generated compatibility layer to the Dataface chart profile/mapping layer rather than letting renderer code consume raw ad hoc dicts.
- Keep Dataface-owned dashboard concepts (
variables,queries,tables,layout, defaults policy) outside the generated Vega-Lite layer. - Add parity tests so vendored-schema changes or stale generated artifacts fail loudly.
Implementation Progress
- Task scope updated after architecture clarification:
- target is a fully defined Pydantic chart layer
- generated Vega-Lite Pydantic models are the implementation strategy for the mirrored chart surface
- supporting schema/catalog artifacts should be derived from that generated layer where practical
-
Dataface policy/divergence stays in the profile-mapping layer and defaults system, not in handwritten partial Vega-Lite mirrors
-
Generator script:
scripts/generate_vega_lite_models.py - Reads vendored
vega-lite-v5.schema.json.gz(456 definitions) - Converts JSON Schema definitions to Pydantic models mechanically
- Handles: object types → BaseModel, string enums → Literal, $ref → type alias, anyOf/oneOf → Union
- Topological sort with cycle-breaking ensures valid definition order
- Sanitizes names (angle brackets, Python keywords, import collisions like
Field/Type/Dict) - Emits: enums → classes → complex aliases (classes use deferred annotations, aliases execute eagerly)
- Deterministic output (md5-based disambiguation, sorted topo ties)
-
All models use
extra="allow"for forward-compatible Vega-Lite passthrough -
Generated output:
dataface/core/compile/vega_lite/_generated.py(~168KB) - 247 Pydantic model classes (MarkDef, Encoding, Config, Scale, Axis, Legend, etc.)
- 197 type aliases (enums, unions, refs)
- 456 total definitions covering the full Vega-Lite v5 chart surface
-
VegaLiteBaseModelbase class withextra="allow"+populate_by_name=True -
Package:
dataface/core/compile/vega_lite/ __init__.py: curated re-exports of key chart-grammar types_generated.py: full generated surface-
catalog.py: structural catalog derivation (list_models, list_encoding_channels, list_mark_properties, list_config_sections, get_model_json_schema) -
Parity tests:
tests/core/test_vega_lite_generated.py(12 tests) - Staleness: re-generating must produce identical output
- Structure: definition count matches, all models have
extra="allow", key chart types exist - Property coverage: MarkDef, Encoding, Config spot-checked against expected fields
-
Usability: construction, extra-field passthrough, dict roundtrip, JSON Schema derivation
-
Connection to existing layer:
chart_settings_types.pydocstring updated to reference generated Axis/Scale contracts- Axis/scale fields remain
dict[str, Any]for backward compat (profile layer does dict manipulation) - Generated types available for author-time validation and incremental migration
QA Exploration
- [x] N/A — non-UI task (generated Pydantic models and schema tooling, no browser interaction)
Review Feedback
- [ ] Review cleared