Dataface Tasks

Move playground examples to DuckDB and ship pre-built inspect.json

IDCONTEXT_CATALOG_NIMBLE-MOVE_PLAYGROUND_EXAMPLES_TO_DUCKDB_AND_SHIP_PRE_BUILT_INSPECT_JSON
Statuscompleted
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownerdata-ai-engineer-architect
Completed bydave
Completed2026-03-22

Problem

The playground examples use raw CSV files via CsvAdapter (Python stdlib csv.DictReader). This means: no SQL queries, no joins, no dft inspect support, and no AI catalog context. The playground AI agent will have no rich schema/profile information to work with.

Context

  • Example CSVs: examples/assets/data/ (sales.csv, products.csv, users.csv, us_states.csv, world_countries.csv)
  • Example dbt project: examples/tutorial_dbt/ (DuckDB :memory:)
  • Playground adapter setup: apps/playground/routes.py — DbtAdapter + SqlAdapter + CsvAdapter
  • No target/inspect.json exists anywhere in examples today
  • DuckDB files are small and git-friendly when the data is small
  • Depends on: dft inspect catalog builder task, CSV inspect support task
  • Enables: Playground MCP tools task (agent needs inspect.json to provide catalog context)

Possible Solutions

Create a build script that loads CSVs into a DuckDB file, run dft inspect to generate inspect.json, check both in. Playground examples switch from type: csv to type: duckdb.

  • Pros: Works out of the box, playground AI has full context immediately, SQL queries work on example data
  • Cons: Binary DuckDB file in git (small — example data is tiny)

B. Keep CSVs, rely on ephemeral DuckDB inspect

Use the CSV inspect task to profile CSVs via ephemeral DuckDB, but don't migrate the execution path.

  • Pros: No binary in git
  • Cons: Playground still can't do SQL joins on example data, CsvAdapter stays limited

Plan

Approach A.

  1. Create build scriptexamples/build_examples_db.py (or just recipe) that: - Reads all CSVs from examples/assets/data/ - Loads into examples/examples.duckdb via DuckDB read_csv_auto() - Includes tutorial_dbt seed data if applicable
  2. Update example sources — change examples/_sources.yaml and examples/dataface.yml to reference DuckDB source instead of CSV
  3. Run dft inspect — profile all tables in the DuckDB, generate examples/target/inspect.json with full profiles, relationships, descriptions
  4. Check in artifactsexamples/examples.duckdb + examples/target/inspect.json
  5. Update playground adapter setupapps/playground/routes.py points at the DuckDB file, can drop CsvAdapter for examples
  6. Update example YAML dashboards — any that reference type: csv switch to the DuckDB source
  7. Add just rebuild-examples recipe — so the build script is easy to re-run when example data changes

Files to modify: - New: examples/build_examples_db.py (or Justfile recipe) - examples/_sources.yaml / examples/dataface.yml - apps/playground/routes.py — adapter setup - Example YAML files that reference CSV sources - Justfile — add rebuild recipe

Implementation Progress

Completed

  • [x] Build scriptexamples/build_examples_db.py loads all 5 CSVs into examples/examples.duckdb (sales: 152 rows, products: 8, users: 10, us_states: 50, world_countries: 54)
  • [x] Source configurationexamples/_sources.yaml and examples/dataface.yml default to examples_db DuckDB source
  • [x] Pre-built inspect.jsonexamples/target/inspect.json (61KB) with full column profiles, semantic types, distributions for all 5 tables
  • [x] DuckDB file checked inexamples/examples.duckdb committed as binary artifact
  • [x] Playground adapter setupapps/playground/routes.py uses SqlAdapter with DuckDB connection, CsvAdapter removed
  • [x] 17 playground YAML dashboards migrated — all type: csv/file: queries → sql: SELECT * FROM <table> with source: examples_db
  • [x] Shared queries updated_shared_queries.yml and _doc_examples.yaml use SQL with profile: examples_db for sales, us_states, world_countries
  • [x] Cascading meta.yamlexamples/playground/meta.yaml sets default source: examples_db
  • [x] Justfile recipejust rebuild-examples runs the build script
  • [x] All tests pass — 99 integration (9 skipped), 67 doc, 73 playground, 7 visual, 46 inspect server

Key Decisions

  • Used profile: (not source:) for query-level DuckDB references because SqlAdapter resolves _sources.yaml profiles but not string source references
  • compile() doesn't resolve meta.yaml (only compile_file() does), so each dashboard YAML needs explicit source: when tested via compile()
  • Kept CSV files in examples/assets/data/ as the source of truth; DuckDB is a derived artifact rebuilt via just rebuild-examples
  • Visual test conftest updated to use AdapterRegistry(project_root=EXAMPLES_DIR) instead of manual CsvAdapter swap

Review Feedback

  • [ ] Review cleared