dft inspect with no args should profile all tables

ID	CONTEXT_CATALOG_NIMBLE-DFT_INSPECT_WITH_NO_ARGS_SHOULD_PROFILE_ALL_TABLES
Status	completed
Priority	p1
Milestone	m1-ft-analytics-analyst-pilot
Owner	data-ai-engineer-architect

Problem

Three issues with how the catalog gets built today:

No batch profiling — must run dft inspect table <name> per table manually
Relationships not stored — TableInspection.relationships field exists but is never populated during profiling. detect_relationships() runs on-the-fly in catalog() every time, even though the results are deterministic from cached profiles.
dbt descriptions not baked in — DbtSchemaParser re-parses schema.yml on every catalog() call instead of merging descriptions into inspect.json at build time.

The result: target/inspect.json is incomplete, and MCP tools must re-derive relationships, re-parse dbt, and hit the live DB on every call. inspect.json should be a fully self-contained catalog artifact.

Context

TableInspection model already has relationships field (inspector.py:489) and serializes it (inspector.py:526-527) — just never populated
detect_relationships() needs 2+ TableInspection objects — works fine once multiple tables are profiled
DbtSchemaParser scans models/**/schema.yml — could run once at build time
description_merge.py has priority stack: dbt_schema_yml > database_comment > curated > inferred
InspectionStorage already supports multi-table accumulation in a single target/inspect.json
Table discovery exists in _list_schema() (dataface/ai/mcp/tools.py) via InspectConnection
dbt connection auto-detection exists in _detect_dbt_connection()
Non-dbt users configure sources in dataface.yml or _sources.yaml
Follow-on: The SQLGlot propagation task (M2) adds Phase 4 that extends dft inspect to cover derived models via static type propagation, so only base tables need real profiling. See static-semantic-type-propagation-through-sql-queries-via-sqlglot.md.
Table key format: dbt uses unique_id as resource_type.project.model_name (e.g., model.dataface_examples.stg_products). For inspect.json keys, follow dbt's convention in dbt projects. For non-dbt, use source.schema.table where source comes from dataface.yml. This prevents collisions when a project has multiple databases/schemas.

Possible Solutions

A. Make `dft inspect` the complete catalog build command — Recommended

dft inspect (bare) becomes: 1. Discover all tables from the connected source 2. Profile each table (stats, grain, semantic types, quality flags) 3. Merge dbt schema.yml descriptions (if dbt project) or dataface.yml descriptions 4. Run cross-table relationship detection + fanout risk scoring 5. Save everything to target/inspect.json as one self-contained artifact 6. Print summary with audit-style readiness scores

dft inspect table X (single table) should also update relationships — load existing profiles from inspect.json, add the new one, re-run relationship detection across all, save back.

Pros: Single command, self-contained artifact, MCP tools become pure readers
Cons: Slower for large databases — mitigate with --approximate auto and --include/--exclude filters

B. Keep relationships and descriptions as on-the-fly layers

Pros: No build step
Cons: Redundant work on every call, no single source of truth, relationships never persisted

Plan

Approach A.

Phase 1: Bake relationships into inspect.json

After profiling a table via dft inspect table X, load all cached profiles from inspect.json
Run detect_relationships() + enrich_relationships() + score_fanout_risk() across all cached profiles
Update the relationships field on all affected TableInspection entries
Save back to inspect.json

Phase 2: Bake dbt descriptions into inspect.json

During profiling, run DbtSchemaParser and merge_dbt_descriptions()
Store merged descriptions in the profile (use description_candidates with provenance)
catalog() reads descriptions from inspect.json instead of re-parsing

Phase 3: Batch profiling (`dft inspect` bare)

Make table_name optional in inspect_command() CLI
When omitted, discover all tables via schema introspection
Profile each table with progress indicator
Run relationship detection across all at the end
Add --include/--exclude glob filters for table name patterns
Default to --approximate auto for batch
Print summary: N tables profiled, description coverage %, relationship coverage, warnings

Phase 4: Simplify MCP catalog()

catalog() reads from inspect.json as primary source — no live DB needed for profiled tables
Fall back to live introspection only for tables NOT in inspect.json
Remove on-the-fly DbtSchemaParser calls and _detect_catalog_relationships() re-computation

Files to modify: - dataface/cli/commands/inspect.py — batch mode, relationship/dbt bake-in - dataface/core/inspect/inspector.py — populate relationships after profiling - dataface/core/inspect/storage.py — bulk save/update helpers - dataface/ai/mcp/tools.py — simplify catalog() to read from inspect.json - dataface/ai/mcp/catalog_enrichment.py — may become unnecessary

Implementation Progress

Review Feedback

[ ] Review cleared