Dataface Tasks

Add optional markdown-svg raw HTML foreignObject passthrough

IDDFT_CORE-ADD_OPTIONAL_MARKDOWN_SVG_RAW_HTML_FOREIGNOBJECT_PASSTHROUGH
Statuscompleted
Priorityp2
Milestonem5-v1-2-launch
Ownersr-engineer-architect
Completed bydave
Completed2026-03-24

Problem

Add an opt-in markdown-svg configuration flag, off by default, that allows raw HTML embedded in markdown to be rendered by wrapping it in SVG foreignObject blocks. Document behavior, security constraints, and fallback behavior when disabled.

Context

The mdsvg library (libs/markdown-svg/) converts Markdown AST to SVG. Currently, raw HTML in markdown is not recognized by the parser — it falls through to paragraph text and gets XML-escaped by escape_svg_text(). The codebase already uses <foreignObject> successfully in dataface/core/render/variable_controls.py for interactive HTML form controls inside SVG, proving the pattern works.

Key files: - libs/markdown-svg/src/mdsvg/types.py — AST node types - libs/markdown-svg/src/mdsvg/parser.py — Markdown parser - libs/markdown-svg/src/mdsvg/renderer.py — SVG renderer + convenience functions - libs/markdown-svg/src/mdsvg/__init__.py — Public API exports

Possible Solutions

Add HTML_BLOCK block type to the parser so it always detects raw HTML. Add allow_raw_html: bool = False to SVGRenderer.__init__() and convenience functions. When enabled, raw HTML is sanitized (script/event handlers stripped) and wrapped in <foreignObject> with XHTML namespace. When disabled (default), HTML blocks render as escaped text paragraphs.

Trade-offs: Clean separation — parser always parses correctly, renderer controls output. Default-off means zero behavior change for existing users. Sanitization provides defense-in-depth even when enabled.

2. Style-level flag

Put the flag on the Style dataclass. Rejected because Style is about visual styling (fonts, colors, spacing), not security/behavior policy. Mixing concerns.

3. Separate sanitizer middleware

Add a post-parse transform step that strips HTML blocks when disabled. Over-engineered for this use case — the renderer already dispatches per block type.

Plan

  1. Types: Add BlockType.HTML_BLOCK enum value and RawHtmlBlock(html: str) dataclass
  2. Parser: Detect HTML blocks (lines starting with <tag) — collect until closing tag for paired tags, single-line for void tags
  3. Renderer: Add allow_raw_html flag. When True, render via <foreignObject> with sanitized HTML. When False, escape and render as text paragraph
  4. Convenience functions: Pass allow_raw_html through render(), render_blocks(), render_content()
  5. Tests: 15 tests covering parsing, disabled rendering, enabled rendering, and security (script/event handler stripping)

Implementation Progress

Changes made

libs/markdown-svg/src/mdsvg/types.py - Added BlockType.HTML_BLOCK = "html_block" enum value - Added RawHtmlBlock(Block) frozen dataclass with html: str field - Added RawHtmlBlock to AnyBlock union type

libs/markdown-svg/src/mdsvg/parser.py - Added HTML_BLOCK_START regex pattern (matches <tagname) - Added HTML_VOID_TAGS frozenset for self-closing tags - Added _parse_html_block() method — handles single-line, void, and multi-line HTML blocks - Added HTML block detection to _try_parse_block() (after image blocks, before fallback) - Added HTML block detection to _collect_paragraph_lines() so paragraphs stop at HTML

libs/markdown-svg/src/mdsvg/renderer.py - Added _sanitize_html() module-level function — strips <script>, <iframe>, <object>, <embed>, <applet>, <form>, <input>, <textarea>, <button> tags and on* event attributes - Added allow_raw_html: bool = False parameter to SVGRenderer.__init__() - Added _render_raw_html_block() method — wraps sanitized HTML in <foreignObject> with XHTML namespace when enabled, escapes to text paragraph when disabled - Added RawHtmlBlock handling to _render_block() dispatch - Added allow_raw_html parameter to render(), render_blocks(), render_content() convenience functions

libs/markdown-svg/src/mdsvg/__init__.py - Added RawHtmlBlock to imports and __all__

libs/markdown-svg/tests/test_raw_html.py — 15 new tests: - TestRawHtmlParsing (4 tests): simple block, multiline, mixed, non-HTML angle brackets - TestRawHtmlRenderingDisabled (2 tests): default escaping behavior preserved - TestRawHtmlRenderingEnabled (6 tests): foreignObject output, dimensions, multiline, mixed, convenience functions - TestRawHtmlSecurity (3 tests): script tag stripping, event handler stripping, safe attribute preservation

Security constraints

  • Default off: allow_raw_html=False means zero behavior change for existing users
  • Sanitization: Even when enabled, dangerous tags (script, iframe, object, embed, applet, form, input, textarea, button) and on* event attributes are stripped
  • Safe attributes preserved: style, class, id, etc. remain intact

Fallback when disabled

When allow_raw_html=False (default), RawHtmlBlock nodes are rendered as escaped text inside a standard SVG <text> paragraph — the same behavior as before this change (HTML was treated as paragraph text and XML-escaped).

QA Exploration

  • [x] QA exploration completed (or N/A for non-UI tasks)

N/A — this is a library-level change in mdsvg with no UI surface. The rendering is tested via 15 unit tests covering all code paths.

Review Feedback

  • [ ] Review cleared