Add optional markdown-svg raw HTML foreignObject passthrough
Problem
Add an opt-in markdown-svg configuration flag, off by default, that allows raw HTML embedded in markdown to be rendered by wrapping it in SVG foreignObject blocks. Document behavior, security constraints, and fallback behavior when disabled.
Context
The mdsvg library (libs/markdown-svg/) converts Markdown AST to SVG. Currently, raw HTML in markdown is not recognized by the parser — it falls through to paragraph text and gets XML-escaped by escape_svg_text(). The codebase already uses <foreignObject> successfully in dataface/core/render/variable_controls.py for interactive HTML form controls inside SVG, proving the pattern works.
Key files:
- libs/markdown-svg/src/mdsvg/types.py — AST node types
- libs/markdown-svg/src/mdsvg/parser.py — Markdown parser
- libs/markdown-svg/src/mdsvg/renderer.py — SVG renderer + convenience functions
- libs/markdown-svg/src/mdsvg/__init__.py — Public API exports
Possible Solutions
1. Parser + Renderer with allow_raw_html flag (Recommended)
Add HTML_BLOCK block type to the parser so it always detects raw HTML. Add allow_raw_html: bool = False to SVGRenderer.__init__() and convenience functions. When enabled, raw HTML is sanitized (script/event handlers stripped) and wrapped in <foreignObject> with XHTML namespace. When disabled (default), HTML blocks render as escaped text paragraphs.
Trade-offs: Clean separation — parser always parses correctly, renderer controls output. Default-off means zero behavior change for existing users. Sanitization provides defense-in-depth even when enabled.
2. Style-level flag
Put the flag on the Style dataclass. Rejected because Style is about visual styling (fonts, colors, spacing), not security/behavior policy. Mixing concerns.
3. Separate sanitizer middleware
Add a post-parse transform step that strips HTML blocks when disabled. Over-engineered for this use case — the renderer already dispatches per block type.
Plan
- Types: Add
BlockType.HTML_BLOCKenum value andRawHtmlBlock(html: str)dataclass - Parser: Detect HTML blocks (lines starting with
<tag) — collect until closing tag for paired tags, single-line for void tags - Renderer: Add
allow_raw_htmlflag. When True, render via<foreignObject>with sanitized HTML. When False, escape and render as text paragraph - Convenience functions: Pass
allow_raw_htmlthroughrender(),render_blocks(),render_content() - Tests: 15 tests covering parsing, disabled rendering, enabled rendering, and security (script/event handler stripping)
Implementation Progress
Changes made
libs/markdown-svg/src/mdsvg/types.py
- Added BlockType.HTML_BLOCK = "html_block" enum value
- Added RawHtmlBlock(Block) frozen dataclass with html: str field
- Added RawHtmlBlock to AnyBlock union type
libs/markdown-svg/src/mdsvg/parser.py
- Added HTML_BLOCK_START regex pattern (matches <tagname)
- Added HTML_VOID_TAGS frozenset for self-closing tags
- Added _parse_html_block() method — handles single-line, void, and multi-line HTML blocks
- Added HTML block detection to _try_parse_block() (after image blocks, before fallback)
- Added HTML block detection to _collect_paragraph_lines() so paragraphs stop at HTML
libs/markdown-svg/src/mdsvg/renderer.py
- Added _sanitize_html() module-level function — strips <script>, <iframe>, <object>, <embed>, <applet>, <form>, <input>, <textarea>, <button> tags and on* event attributes
- Added allow_raw_html: bool = False parameter to SVGRenderer.__init__()
- Added _render_raw_html_block() method — wraps sanitized HTML in <foreignObject> with XHTML namespace when enabled, escapes to text paragraph when disabled
- Added RawHtmlBlock handling to _render_block() dispatch
- Added allow_raw_html parameter to render(), render_blocks(), render_content() convenience functions
libs/markdown-svg/src/mdsvg/__init__.py
- Added RawHtmlBlock to imports and __all__
libs/markdown-svg/tests/test_raw_html.py — 15 new tests:
- TestRawHtmlParsing (4 tests): simple block, multiline, mixed, non-HTML angle brackets
- TestRawHtmlRenderingDisabled (2 tests): default escaping behavior preserved
- TestRawHtmlRenderingEnabled (6 tests): foreignObject output, dimensions, multiline, mixed, convenience functions
- TestRawHtmlSecurity (3 tests): script tag stripping, event handler stripping, safe attribute preservation
Security constraints
- Default off:
allow_raw_html=Falsemeans zero behavior change for existing users - Sanitization: Even when enabled, dangerous tags (
script,iframe,object,embed,applet,form,input,textarea,button) andon*event attributes are stripped - Safe attributes preserved:
style,class,id, etc. remain intact
Fallback when disabled
When allow_raw_html=False (default), RawHtmlBlock nodes are rendered as escaped text inside a standard SVG <text> paragraph — the same behavior as before this change (HTML was treated as paragraph text and XML-escaped).
QA Exploration
- [x] QA exploration completed (or N/A for non-UI tasks)
N/A — this is a library-level change in mdsvg with no UI surface. The rendering is tested via 15 unit tests covering all code paths.
Review Feedback
- [ ] Review cleared