dft inspect with no args should profile all tables
Problem
Three issues with how the catalog gets built today:
- No batch profiling — must run
dft inspect table <name>per table manually - Relationships not stored —
TableInspection.relationshipsfield exists but is never populated during profiling.detect_relationships()runs on-the-fly incatalog()every time, even though the results are deterministic from cached profiles. - dbt descriptions not baked in —
DbtSchemaParserre-parsesschema.ymlon everycatalog()call instead of merging descriptions intoinspect.jsonat build time.
The result: target/inspect.json is incomplete, and MCP tools must re-derive relationships, re-parse dbt, and hit the live DB on every call. inspect.json should be a fully self-contained catalog artifact.
Context
TableInspectionmodel already hasrelationshipsfield (inspector.py:489) and serializes it (inspector.py:526-527) — just never populateddetect_relationships()needs 2+TableInspectionobjects — works fine once multiple tables are profiledDbtSchemaParserscansmodels/**/schema.yml— could run once at build timedescription_merge.pyhas priority stack:dbt_schema_yml > database_comment > curated > inferredInspectionStoragealready supports multi-table accumulation in a singletarget/inspect.json- Table discovery exists in
_list_schema()(dataface/ai/mcp/tools.py) viaInspectConnection - dbt connection auto-detection exists in
_detect_dbt_connection() - Non-dbt users configure sources in
dataface.ymlor_sources.yaml - Follow-on: The SQLGlot propagation task (M2) adds Phase 4 that extends
dft inspectto cover derived models via static type propagation, so only base tables need real profiling. Seestatic-semantic-type-propagation-through-sql-queries-via-sqlglot.md. - Table key format: dbt uses
unique_idasresource_type.project.model_name(e.g.,model.dataface_examples.stg_products). For inspect.json keys, follow dbt's convention in dbt projects. For non-dbt, usesource.schema.tablewhere source comes fromdataface.yml. This prevents collisions when a project has multiple databases/schemas.
Possible Solutions
A. Make dft inspect the complete catalog build command — Recommended
dft inspect (bare) becomes:
1. Discover all tables from the connected source
2. Profile each table (stats, grain, semantic types, quality flags)
3. Merge dbt schema.yml descriptions (if dbt project) or dataface.yml descriptions
4. Run cross-table relationship detection + fanout risk scoring
5. Save everything to target/inspect.json as one self-contained artifact
6. Print summary with audit-style readiness scores
dft inspect table X (single table) should also update relationships — load existing profiles from inspect.json, add the new one, re-run relationship detection across all, save back.
- Pros: Single command, self-contained artifact, MCP tools become pure readers
- Cons: Slower for large databases — mitigate with
--approximate autoand--include/--excludefilters
B. Keep relationships and descriptions as on-the-fly layers
- Pros: No build step
- Cons: Redundant work on every call, no single source of truth, relationships never persisted
Plan
Approach A.
Phase 1: Bake relationships into inspect.json
- After profiling a table via
dft inspect table X, load all cached profiles frominspect.json - Run
detect_relationships()+enrich_relationships()+score_fanout_risk()across all cached profiles - Update the
relationshipsfield on all affectedTableInspectionentries - Save back to
inspect.json
Phase 2: Bake dbt descriptions into inspect.json
- During profiling, run
DbtSchemaParserandmerge_dbt_descriptions() - Store merged descriptions in the profile (use
description_candidateswith provenance) catalog()reads descriptions frominspect.jsoninstead of re-parsing
Phase 3: Batch profiling (dft inspect bare)
- Make
table_nameoptional ininspect_command()CLI - When omitted, discover all tables via schema introspection
- Profile each table with progress indicator
- Run relationship detection across all at the end
- Add
--include/--excludeglob filters for table name patterns - Default to
--approximate autofor batch - Print summary: N tables profiled, description coverage %, relationship coverage, warnings
Phase 4: Simplify MCP catalog()
catalog()reads frominspect.jsonas primary source — no live DB needed for profiled tables- Fall back to live introspection only for tables NOT in
inspect.json - Remove on-the-fly
DbtSchemaParsercalls and_detect_catalog_relationships()re-computation
Files to modify:
- dataface/cli/commands/inspect.py — batch mode, relationship/dbt bake-in
- dataface/core/inspect/inspector.py — populate relationships after profiling
- dataface/core/inspect/storage.py — bulk save/update helpers
- dataface/ai/mcp/tools.py — simplify catalog() to read from inspect.json
- dataface/ai/mcp/catalog_enrichment.py — may become unnecessary
Implementation Progress
Review Feedback
- [ ] Review cleared