Chart Rendering — Design Philosophy

Core Principle: Extended Vega-Lite Grammar

Dataface charts use an extended Vega-Lite grammar. For any chart type that Vega-Lite supports natively (bar, line, area, arc, point, etc.), our YAML should be a thin wrapper around Vega-Lite's own concepts — not a parallel chart language that happens to compile down to Vega-Lite.

Concretely:

x, y, color, size, shape, theta map directly to Vega-Lite encoding channels.
type: bar means Vega-Lite mark: bar. Not a Dataface-invented bar chart.
Axis, scale, and legend configuration should pass through to Vega-Lite's native axis, scale, and legend properties — not be reinvented under Dataface-specific names.

The translation layer should be mechanical: read YAML fields, build a Vega-Lite spec dict, inject the query data, apply theme/config defaults. No hidden mutations, no data-driven rewrites, no surprise layers.

Data Belongs to Queries, Not Charts

The query layer owns dataset meaning and grain. The chart layer owns visual encoding and presentation.

That means chart rendering should consume already-shaped data:

Do not aggregate, regroup, bucket, or derive new semantic datasets in the viz layer.
Do not perform chart-local reordering that changes the analytical meaning of the result set.
If a chart needs a different grain or ordering, change the query or upstream model instead.

This keeps authored charts predictable. A user reading chart YAML should be able to assume the chart is a presentation over query results, not a second transformation pipeline hidden inside render code.

Native-Option-First Fixes

When fixing a Vega-Lite chart bug, the default move is to expose or pass through a native Vega-Lite option, not to add renderer-side policy code.

Fix rendering issues by wiring through Vega-Lite encoding, axis, scale, legend, mark, or top-level spec properties whenever possible.
If Dataface has a chart-level abstraction such as sort, map it mechanically onto the correct Vega-Lite property as close to spec construction as possible.
Do not add _apply_* helpers that inspect the dataset to invent presentational values for Vega-Lite-native charts unless the chart type is auto or the behavior is explicitly opt-in.
If a fix truly cannot be expressed with Vega-Lite options, document the exact Vega-Lite limitation in code and in the PR description.

Before adding renderer logic for a Vega-Lite-native chart, answer:

Which Vega-Lite property already expresses this behavior?
Can we expose that property through existing chart settings or config?
If not, what exact Vega-Lite gap forces Python-side logic?

Config Is the Single Source of Truth — No Fallbacks in Code

All defaults live in YAML. Code must never hardcode fallbacks.

The config system works like this:

default_config.yml + chart_defaults.yml + built-in structures/themes + user dataface.yml overlays  →  selected structure reapplied  →  get_config() returns complete DotDict

By the time any rendering code runs, get_config() is guaranteed to return a complete config with every value populated. This means:

Never hardcode a default value in Python. If you need axis_label_limit, read get_config().chart.axis_label_limit — don't write 200 in the code.
Never write fallback patterns like config.get("x", 200) or x if x else 200. The config is always complete.
Never duplicate config values. If vega.config.axisY.orient is "right" in the active structure, don't also hardcode "orient": "right" in the Python encoder. Read it from config.
If a default is missing from YAML, add it there — don't paper over it with a hardcoded value in code.

The Vega-Lite config object in chart_defaults.yml (under vega.config) is applied to every spec at render time, after the active structure has been merged into the runtime config. Structural defaults like y-axis orientation, legend placement, title anchoring, and whether grids/domains exist belong in built-in structures. Visual defaults like grid colors, font sizes, and mark styles live in chart/theme config. The Python translation layer should NOT re-specify these — they come from config.

That includes structure-owned heuristics we previously hardcoded in Python, such as the default bar-chart categorical y-axis side and the x-axis label posture choices used by the smart-axis helper.

A useful metaphor:

Theme is like CSS: the painting of the scaffold - font, color, width, size, stroke, fill, and other visual styling
Structure is like HTML: the chart's scaffold - how it is arranged and whether structural elements exist at all

Optional explicit structure presets and theme-paired structure overlays merge before the final theme layer. This keeps mechanics like spacing/orientation separate from brand styling while preserving one mechanical config pipeline.

Semantic State vs Presentation Defaults

Not every unresolved chart field should be filled from config.

Semantic fields can be unresolved and use auto/null states. Examples: chart type, inferred format, inferred zero, chosen x/y.
Presentation defaults are static and always come from config. Examples: axis label limits, tooltip number format, mark styling, chart height.

Dataface keeps these responsibilities separate:

chart_enrichment.* config controls inference policy
vega.config, chart.*, and chart_types.* control rendering defaults

This prevents semantic chart meaning from being silently filled by presentation config.

Why This Matters

AI agents and new contributors instinctively add defensive defaults:

# BAD — hardcoded default that duplicates/contradicts config
y_axis_config = {&quot;orient&quot;: &quot;right&quot;, &quot;labelLimit&quot;: 200}

# GOOD — config is guaranteed complete, just read it
config = get_config()
# orient comes from the active structure via vega.config.axisY
# labelLimit comes from chart.axis_label_limit

Every hardcoded default is a bug waiting to happen: it can drift from the config value, it can't be overridden by user config, and it makes the code harder to reason about.

What Goes Where

Default type	Where it lives	Example
Structural chart defaults	`structures.<name>.` in `chart_structures/.yml`	`vega.config.axisY.orient: "right"`
Vega-Lite axis/legend/mark config	`vega.config` in `chart_defaults.yml`	`axis.gridColor: "#DFE1E5"`
Chart dimension defaults	`chart.*` in `default_config.yml`	`chart.height: 300`
Chart-type-specific settings	`chart_types.<type>.*` in `default_config.yml`	`chart_types.line.stroke_width: 2`
Layout defaults	`layout.*` in `default_config.yml`	`layout.rows.gap: 20`
Style defaults	`style.*` in `default_config.yml`	`style.font_family: "Liberation Sans"`

Structures can inherit from other structures so we can keep a small set of canonical scaffolding presets instead of copying near-identical defaults into many theme-specific files. Legacy built-in themes can then point at one of those shared structures via theme_structures without re-bundling structure values inside the theme itself.

Practical Rule

Use this test:

If the question is what exists, where it sits, or how the scaffold behaves by default, it is structure
If the question is how that scaffold is painted, it is theme

Examples of values that belong in structure:

y-axis on the left or right
x-axis orientation
whether grid lines exist
whether an axis domain exists
legend placement
title anchoring
default axis label angle / baseline / alignment posture
axis title alignment / angle / x-y offsets
whether band grids exist

Examples of values that belong in theme:

font family
colors
stroke and fill
stroke width
symbol size
font size
palette choices
table and variable styling

Internal Framing: Data, Encoding, Structure, Theme

For internal design discussions, it helps to separate chart concerns into four concepts without changing authored YAML:

Data: the already-shaped query result a chart renders
Encoding: the chart-authored field mappings and per-chart intent (x, y, color, title, tooltip, sort, etc.)
Structure: reusable chart scaffold defaults and existence decisions (axis side, whether grid/domain exists, legend placement, title anchoring, label posture defaults)
Theme: reusable visual styling - the painting of that scaffold (font, color, width, size, surfaces, table/input styling)

Another way to say this:

Theme = CSS / painting
Structure = HTML / scaffold

Authored chart YAML remains flat. This vocabulary is for internal architecture and config organization, not for introducing new required YAML nesting.

Chart Reasoning Frameworks

Dataface uses a small set of internal spectra and framings to reason about chart choice, chart sophistication, and what kinds of enrichment are justified in a given situation.

These frameworks are meant to help with:

chart recommendation
AI-assisted chart authoring
deciding how much statistical or narrative enrichment to add
keeping reasoning explicit instead of burying it in renderer heuristics

They complement the data / encoding / structure / theme framing:

data / encoding / structure / theme explains chart responsibilities
chart reasoning frameworks explain how we think about representation, available context, and sophistication level

Current Canonical Spectra

Text-to-Mark Spectrum

Another useful internal framing is the discrete text-to-mark spectrum:

table -> colored table -> spark table -> graphic table -> chart -> mini-chart

This spectrum describes gradual changes in how statistical meaning is carried: first mainly by text and tabular structure, then increasingly by graphical marks.

For structure design, this spectrum is a taxonomy above any individual preset. It helps us choose and name structure families, especially when deciding how much tabular scaffolding to preserve versus how much of the canvas should be given over to marks.

Context Spectrum

Another useful internal framing is the context spectrum: how much context we have available when deciding what chart to make and how much meaning it can responsibly carry.

data type -> field semantics -> observed values -> comparative context -> human context

The levels are:

Data type: only primitive shape information such as numeric, categorical, temporal, boolean, or geographic. At this level we can choose safe generic representations such as a table, histogram, scatterplot, or simple count bar chart, but we know little about analytical intent.
Field semantics: column names, semantic types, units, and role hints such as revenue, date, country, or conversion_rate. At this level we can make plausible chart-type guesses because we understand what the fields mean, not just their storage types.
Observed values: the actual distribution and contents of the bound dataset. At this level we can add data-driven annotation or formatting such as averages, thresholds, outlier highlighting, ranked ordering, or range emphasis.
Comparative context: additional datasets or windows beyond the minimum needed for the basic chart, such as prior periods, baselines, targets, benchmarks, cohorts, or long-run history. At this level we can express change versus prior state, seasonality, deviation from target, and other comparative interpretations.
Human context: audience, workflow moment, decision to support, and what deserves emphasis. At this level we can decide what to foreground, what to de-emphasize, how much annotation to include, and whether the right output is a dashboard chart, report chart, or table.

This spectrum is useful for chart recommendation, AI-assisted design, and internal reasoning about progressive enrichment.

Safe-Enrichment Rule

Moving rightward on the context spectrum expands what we may reasonably infer, but it does not erase the data-shape boundary.

Data type and field semantics can support chart recommendation and default encoding choice.
Observed values can support presentational enrichment derived from the bound dataset, such as highlighting extrema or showing an average reference line.
Comparative context requires that the extra benchmark, prior-period, or seasonal dataset actually be available in the query result or explicitly supplied context. The renderer should not silently invent those datasets.
Human context can guide emphasis and narrative framing, but it should not override the underlying analytical meaning of the data.

In short: more context allows richer charts, but extra meaning still needs an explicit source.

Interpretive-Commitment Spectrum

Another useful internal framing is the interpretive-commitment spectrum: how much analytical opinion the chart expresses through its form, defaults, and added guidance.

least committed -> lightly framed -> analytically guided -> interpretively emphasized -> narratively directed

This is different from the context spectrum.

The context spectrum is about how much information we have available.
The interpretive-commitment spectrum is about how strongly the chart tells the reader what to notice or how to read the data.

Examples:

Least committed: a table or raw scatterplot that exposes the data with minimal framing.
Lightly framed: a simple line, bar, or histogram that chooses a strong representational form but adds little extra interpretation.
Analytically guided: a chart with ranked sorting, average lines, thresholds, or benchmark overlays that help the reader interpret the data.
Interpretively emphasized: a chart that highlights outliers, colors one series as the focal point, or visually suppresses less important marks.
Narratively directed: a chart with explicit callouts, annotations, and textual framing that actively directs attention to a particular conclusion.

The important idea is not that more commitment is always better. Lower commitment can be better for open-ended exploration. Higher commitment can be better for monitoring, explanation, persuasion, or decision support.

This spectrum helps us reason about how far to go from neutral display toward explicit interpretation.

Control-and-Freedom Spectrum

Another useful internal framing is the control-and-freedom spectrum: where chart decisions should remain fixed and dependable, where Dataface should carry an opinion, where behavior may adapt to context, and where authored specificity should take over.

conventional and dependable -> opinionated defaults -> context-sensitive adaptation -> authored specificity

This spectrum is different from both the context spectrum and the interpretive-commitment spectrum.

The context spectrum is about how much information we have available.
The interpretive-commitment spectrum is about how strongly the chart tells the reader what to notice.
The control-and-freedom spectrum is about who should have decision rights: stable convention, product default, contextual logic, or explicit authoring.

The levels are:

Conventional and dependable: behaviors grounded in standard practice or basic practice that should stay stable because predictability is more valuable than novelty. These are the defaults users should be able to rely on without re-learning the system each time. We should prefer terms like standard practice or basic practice here rather than best practice, because common convention is not always the best choice in every context. Examples include mechanical grammar consistency, clear axis behavior, and familiar interaction patterns.
Opinionated defaults: places where Dataface should express a product point of view instead of averaging all general chart advice into a bland middle. These defaults should still be legible and teachable, but they may deliberately prefer a stronger house style, such as a favored chart scaffold, label posture, or typography treatment.
Context-sensitive adaptation: places where the system may justifiably adapt based on available evidence such as field semantics, observed values, comparative context, or display density. This is where we allow flexibility in response to the actual charting problem, while still respecting the data-shape boundary and avoiding hidden semantic rewrites.
Authored specificity: decisions that should belong to the person making the chart once they have a clear purpose or editorial intent. This includes cases where the author wants a particular ranking, emphasis, visual posture, or narrative frame because the chart is meant to carry a specific voice.

This framing is useful because not every chart decision should be optimized for the same kind of freedom.

Some choices should be hard to accidentally disturb because they are part of the product's dependable grammar.
Some choices should be opinionated because the product ought to stand for a recognizable visual and analytical point of view.
Some choices should flex because the right answer genuinely depends on the data and context at hand.
Some choices should stay available for explicit override because authored charts sometimes need to be more specific than any generic default can be.

This spectrum also helps clarify where those decisions should live:

dependable mechanics often live in the base grammar and rendering contract
product opinion often lives in structure and theme
context-sensitive adaptation often lives in controlled enrichment logic
explicit specificity often lives in authored encoding and query shape

Dataface should be understood not as a single chart chooser, but as a multidimensional territory of chart and dashboard potential that a user can navigate.

The system's job is:

choose a strong first chart for the current context
make movement through the design space simple and legible
expose meaningful freedom without overwhelming the user with every possible parameter at once

The first chart should be the lowest-regret starting point given current context. That means the safest high-value representation we can justify with the information currently available, not the most elaborate chart we can imagine.

Navigation then happens through degrees of freedom. A degree of freedom is any handle the user can change while exploring the chart space.

Examples of degrees of freedom:

mark type
orientation
sort order
aggregation choice
comparison window
presence or absence of overlays
annotation level
emphasis target
mark color
structure preset
theme

Internally, Dataface may have many degrees of freedom. Product-wise, however, we should expose them through simple navigation affordances that let the user move along one dimension without accidentally changing several others at once.

In practice, that means:

movement along the text-to-mark spectrum should change representational form
movement along the context spectrum should change what enrichment is available
movement along the interpretive-commitment spectrum should change how much guidance or opinion the chart expresses

The general principle is:

many hidden degrees of freedom
few understandable navigation moves

This is the core reason to think in spectra. They give us a way to group many low-level controls into a smaller number of meaningful traversal paths.

Candidate Spectra To Explore Later

These are promising directions, but they are not yet canonical and should not be treated as settled design language.

Reader-task spectrum: from lookup to comparison to trend-reading to distribution to relationship to explanation. This could help us reason about what analytical task a chart needs to support.
Determinacy spectrum: from known chart choice to constrained recommendation to heuristic auto-chart to exploratory suggestion. This could help us reason about AI confidence and when to present one answer versus several.
Comparison-distance spectrum: from single-series view to baseline comparison to prior-period comparison to benchmark comparison to long-run historical context. This could help us reason about how much comparative scaffolding is present.
Audience-specificity spectrum: from generic chart to role-aware chart to workflow-aware chart. This could help us reason about when audience context materially changes presentation.

Pending Additions

This section is expected to grow. When we add a new spectrum, the bar for inclusion should be that it helps future chart choice, chart enrichment, or chart-authoring policy in a concrete way.

Where Dataface Adds Value

Dataface extends beyond vanilla Vega-Lite in specific, explicit areas:

Feature	Why it exists
KPI charts (`type: kpi`)	Single-number display — not a Vega-Lite chart type
Tables (`type: table`)	Rendered as custom SVG, not Vega-Lite
Spark bars (`type: spark_bar`)	Inline sparkline bars, custom SVG
Geographic maps (`type: map`, `choropleth`, `point_map`)	Dataface provides built-in geo sources, join logic, and projection defaults on top of Vega-Lite's geoshape mark
Auto chart type (`type: auto`)	Detects best chart type from data shape — runs `decisions.py` to pick fields, formats, scales
Theming / Structure	Merges Dataface `structure` and `theme` config into Vega-Lite's `config` object
Data binding	Injects query result data into `data.values`
YAML shorthand	`x: date`, `y: revenue` instead of verbose Vega-Lite encoding objects

What the Translation Layer Should NOT Do

Hardcode defaults that belong in config. If it's a default value, it belongs in default_config.yml, not in Python code. See config section above.
Invent parallel names for things Vega-Lite already names (e.g. don't create settings.y_axis when Vega-Lite has encoding.y.axis).
Reshape query results into a new semantic dataset. No chart-layer aggregation, regrouping, bucketing, or semantic resorting. If the chart needs different data, fix the query.
Silently mutate specs based on data after the user authored a chart. If the user wrote type: line, x: date, y: revenue, the output spec should be predictable from the input — not dependent on what the data looks like at render time.
Add Vega-Lite layers the user didn't ask for. No surprise selection params, rule layers, or crosshair overlays. The spec should match what the user wrote.

Crosshairs — Removed

Line and area charts previously generated a multi-layer crosshair spec by default (_generate_line_chart_with_crosshair). This has been removed.

Why it existed: Static SVGs (from vl-convert) have no interactivity. The crosshair was a workaround to bake hover behavior into the Vega-Lite spec itself. It produced a layered spec with selection params and rule marks.

Why it was removed: - It turned a simple type: line into a complex multi-layer spec (~140 lines of Python for what should be ~5 lines of Vega-Lite). - It added interactivity that only works in browser contexts — useless for PDF, PNG, terminal, or MCP output. - The Cloud app already has its own JS interactivity layer (chart-interactivity.js) that handles tooltips and context menus. - It violated the "thin wrapper" principle by silently adding layers the user didn't ask for.

Line and area charts now render as simple single-mark specs with tooltip: true, like every other chart type.

File Layout

File	Responsibility
`vega_lite.py`	Thin public entrypoints for chart rendering and Vega-Lite spec generation.
`models.py`	Internal chart pipeline models (`ChartIntent`, `EnrichmentPatch`, `ResolvedChart`, `RenderArtifact`).
`pipeline.py`	Normalize authored chart intent, run enrichment, and resolve render-ready chart semantics.
`renderers.py`	Renderer registry and artifact selection for standard, geo, and SVG-family charts.
`profile.py`	Chart profile mapping — single home for all Dataface/Vega-Lite divergence (type renames, channel encoding, orientation transforms, sort mapping, bar axis defaults).
`standard_renderer.py`	Thin mechanical Vega-Lite assembly. Consumes profile-mapped state; no Dataface profile logic.
`presentation.py`	Shared presentation helpers (axis config, tooltip assembly, config-driven presentation defaults).
`serialization.py`	Chart-domain JSON serialization helpers.
`spec_builders.py`	Shared Vega-Lite spec builder helpers (base spec construction, title setting).
`type_inference.py`	Vega-Lite type inference from query result data (temporal, quantitative, nominal).
`vega_lite_types.py`	Spec generators for chart types needing custom logic (arc, histogram, boxplot, layered multi-y).
`geo.py`	Geographic chart specs — built-in geo sources, data joins, projections. Uses Vega-Lite geoshape but adds Dataface product value.
`decisions.py`	Data-aware enrichment heuristics (auto format, scale, field detection) used by the pipeline when semantic fields are unresolved.
`rendering.py`	Orchestration: execute query → resolve chart → render. Not Vega-Lite specific.
`kpi.py`, `table.py`, `spark.py`, `spark_bar.py`	Non-Vega-Lite renderers (custom SVG).
`../converters/chart.py`	Chart artifact conversion for SVG/PNG/PDF/JSON outputs.

Guiding Questions for New Code

Before adding chart logic, ask:

Does Vega-Lite already support this? If yes, pass it through — don't wrap it.
Am I inventing a name for something Vega-Lite already names? If yes, use Vega-Lite's name.
Am I hardcoding a default? If yes, put it in default_config.yml instead and read it with get_config().
Am I mutating the spec based on data? If yes, make it opt-in or limit it to type: auto.
Would a user reading the YAML be surprised by the output? If yes, the translation is too thick.

dataface/core/render/chart/DESIGN.md