dft inspect native CSV support via ephemeral DuckDB

ID	CONTEXT_CATALOG_NIMBLE-DFT_INSPECT_NATIVE_CSV_SUPPORT_VIA_EPHEMERAL_DUCKDB
Status	completed
Priority	p1
Milestone	m1-ft-analytics-analyst-pilot
Owner	data-ai-engineer-architect
Completed by	dave
Completed	2026-03-22

Problem

dft inspect only works on SQL databases. Users with CSV sources (common for non-dbt projects, prototyping, and the playground examples) cannot profile their data at all. The CsvAdapter uses Python's csv.DictReader — completely separate from the inspect pipeline.

Context

InspectConnection supports: DuckDB, PostgreSQL, SQLite, BigQuery, Snowflake, Databricks, Redshift, MySQL, SQL Server
CsvAdapter (dataface/core/execute/adapters/csv_adapter.py) reads CSVs with stdlib, no SQL
DuckDB can natively read CSVs: SELECT * FROM 'file.csv' or CREATE TABLE t AS SELECT * FROM read_csv_auto('file.csv')
DuckDB is already a dependency (used for dbt projects and as default dialect)
CSV sources are defined in dataface.yml or _sources.yaml with type: csv and file: path
The playground examples are all CSV-based today
Depends on: complements the dft inspect catalog builder task

Possible Solutions

A. Ephemeral in-memory DuckDB — Recommended

When dft inspect encounters a CSV source, spin up a :memory: DuckDB connection, load the CSV with read_csv_auto(), and profile via the existing DuckDB inspector path. No persistent DuckDB file created — the CSV remains the source of truth.

Pros: Zero config, reuses existing DuckDB inspector, no new files to manage
Cons: Re-reads CSV on every inspect run (fine — CSVs are small, and incremental inspect task handles caching)

B. Require users to load CSVs into DuckDB first

Pros: Simple
Cons: Bad UX, extra manual step, defeats the purpose

Plan

Approach A.

Detect CSV sources — when dft inspect discovers sources from dataface.yml / _sources.yaml, identify type: csv entries
Create ephemeral DuckDB — duckdb.connect(':memory:'), load each CSV via CREATE TABLE {name} AS SELECT * FROM read_csv_auto('{path}')
Profile via existing path — pass the DuckDB connection to TableInspector with dialect='duckdb'
Store in inspect.json — same format as any other table, with source metadata indicating CSV origin
Also support dft inspect table file.csv — direct CSV path as argument, auto-detect as CSV by extension

Files to modify: - dataface/core/inspect/inspector.py — CSV detection + ephemeral DuckDB setup - dataface/cli/commands/inspect.py — accept CSV file paths as arguments - dataface/core/inspect/connection.py — possibly no changes if DuckDB path is reused

Implementation Progress

M1: Core CSV inspect via ephemeral DuckDB (done)

[x] is_csv_path(table_arg) helper in inspector.py — detects .csv extension
[x] TableInspector.from_csv(path) classmethod — creates :memory: DuckDB, loads CSV via read_csv_auto(), returns inspector ready for profiling
[x] CLI routing in inspect_command() — when table arg is an existing .csv file, uses from_csv instead of normal TableInspector constructor
[x] Validation: FileNotFoundError for missing files, ValueError for non-CSV extensions
[x] 15 tests in tests/core/test_inspect_csv.py covering factory, full profile pipeline, path detection, CLI routing
[x] No changes needed to connection.py — reuses existing DuckDB path entirely

Key decision: from_csv is a classmethod factory (like from_bigquery_client) rather than a modification to __init__. This keeps the constructor clean and the CSV concern isolated.

Files changed: - dataface/core/inspect/inspector.py — added is_csv_path() + TableInspector.from_csv() - dataface/cli/commands/inspect.py — CSV auto-detection in inspect_command() - tests/core/test_inspect_csv.py — new test file (15 tests)

Review Feedback

[ ] Review cleared