Move playground examples to DuckDB and ship pre-built inspect.json
Problem
The playground examples use raw CSV files via CsvAdapter (Python stdlib csv.DictReader). This means: no SQL queries, no joins, no dft inspect support, and no AI catalog context. The playground AI agent will have no rich schema/profile information to work with.
Context
- Example CSVs:
examples/assets/data/(sales.csv, products.csv, users.csv, us_states.csv, world_countries.csv) - Example dbt project:
examples/tutorial_dbt/(DuckDB:memory:) - Playground adapter setup:
apps/playground/routes.py— DbtAdapter + SqlAdapter + CsvAdapter - No
target/inspect.jsonexists anywhere in examples today - DuckDB files are small and git-friendly when the data is small
- Depends on:
dft inspectcatalog builder task, CSV inspect support task - Enables: Playground MCP tools task (agent needs inspect.json to provide catalog context)
Possible Solutions
A. DuckDB file + pre-built inspect.json checked into repo — Recommended
Create a build script that loads CSVs into a DuckDB file, run dft inspect to generate inspect.json, check both in. Playground examples switch from type: csv to type: duckdb.
- Pros: Works out of the box, playground AI has full context immediately, SQL queries work on example data
- Cons: Binary DuckDB file in git (small — example data is tiny)
B. Keep CSVs, rely on ephemeral DuckDB inspect
Use the CSV inspect task to profile CSVs via ephemeral DuckDB, but don't migrate the execution path.
- Pros: No binary in git
- Cons: Playground still can't do SQL joins on example data, CsvAdapter stays limited
Plan
Approach A.
- Create build script —
examples/build_examples_db.py(orjustrecipe) that: - Reads all CSVs fromexamples/assets/data/- Loads intoexamples/examples.duckdbvia DuckDBread_csv_auto()- Includes tutorial_dbt seed data if applicable - Update example sources — change
examples/_sources.yamlandexamples/dataface.ymlto reference DuckDB source instead of CSV - Run
dft inspect— profile all tables in the DuckDB, generateexamples/target/inspect.jsonwith full profiles, relationships, descriptions - Check in artifacts —
examples/examples.duckdb+examples/target/inspect.json - Update playground adapter setup —
apps/playground/routes.pypoints at the DuckDB file, can drop CsvAdapter for examples - Update example YAML dashboards — any that reference
type: csvswitch to the DuckDB source - Add
just rebuild-examplesrecipe — so the build script is easy to re-run when example data changes
Files to modify:
- New: examples/build_examples_db.py (or Justfile recipe)
- examples/_sources.yaml / examples/dataface.yml
- apps/playground/routes.py — adapter setup
- Example YAML files that reference CSV sources
- Justfile — add rebuild recipe
Implementation Progress
Completed
- [x] Build script —
examples/build_examples_db.pyloads all 5 CSVs intoexamples/examples.duckdb(sales: 152 rows, products: 8, users: 10, us_states: 50, world_countries: 54) - [x] Source configuration —
examples/_sources.yamlandexamples/dataface.ymldefault toexamples_dbDuckDB source - [x] Pre-built inspect.json —
examples/target/inspect.json(61KB) with full column profiles, semantic types, distributions for all 5 tables - [x] DuckDB file checked in —
examples/examples.duckdbcommitted as binary artifact - [x] Playground adapter setup —
apps/playground/routes.pyuses SqlAdapter with DuckDB connection, CsvAdapter removed - [x] 17 playground YAML dashboards migrated — all
type: csv/file:queries →sql: SELECT * FROM <table>withsource: examples_db - [x] Shared queries updated —
_shared_queries.ymland_doc_examples.yamluse SQL withprofile: examples_dbfor sales, us_states, world_countries - [x] Cascading meta.yaml —
examples/playground/meta.yamlsets defaultsource: examples_db - [x] Justfile recipe —
just rebuild-examplesruns the build script - [x] All tests pass — 99 integration (9 skipped), 67 doc, 73 playground, 7 visual, 46 inspect server
Key Decisions
- Used
profile:(notsource:) for query-level DuckDB references because SqlAdapter resolves_sources.yamlprofiles but not string source references compile()doesn't resolvemeta.yaml(onlycompile_file()does), so each dashboard YAML needs explicitsource:when tested viacompile()- Kept CSV files in
examples/assets/data/as the source of truth; DuckDB is a derived artifact rebuilt viajust rebuild-examples - Visual test conftest updated to use
AdapterRegistry(project_root=EXAMPLES_DIR)instead of manual CsvAdapter swap
Review Feedback
- [ ] Review cleared