Dataface Tasks

dft inspect with no args should profile all tables

IDCONTEXT_CATALOG_NIMBLE-DFT_INSPECT_WITH_NO_ARGS_SHOULD_PROFILE_ALL_TABLES
Statuscompleted
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownerdata-ai-engineer-architect

Problem

Three issues with how the catalog gets built today:

  1. No batch profiling — must run dft inspect table <name> per table manually
  2. Relationships not storedTableInspection.relationships field exists but is never populated during profiling. detect_relationships() runs on-the-fly in catalog() every time, even though the results are deterministic from cached profiles.
  3. dbt descriptions not baked inDbtSchemaParser re-parses schema.yml on every catalog() call instead of merging descriptions into inspect.json at build time.

The result: target/inspect.json is incomplete, and MCP tools must re-derive relationships, re-parse dbt, and hit the live DB on every call. inspect.json should be a fully self-contained catalog artifact.

Context

  • TableInspection model already has relationships field (inspector.py:489) and serializes it (inspector.py:526-527) — just never populated
  • detect_relationships() needs 2+ TableInspection objects — works fine once multiple tables are profiled
  • DbtSchemaParser scans models/**/schema.yml — could run once at build time
  • description_merge.py has priority stack: dbt_schema_yml > database_comment > curated > inferred
  • InspectionStorage already supports multi-table accumulation in a single target/inspect.json
  • Table discovery exists in _list_schema() (dataface/ai/mcp/tools.py) via InspectConnection
  • dbt connection auto-detection exists in _detect_dbt_connection()
  • Non-dbt users configure sources in dataface.yml or _sources.yaml
  • Follow-on: The SQLGlot propagation task (M2) adds Phase 4 that extends dft inspect to cover derived models via static type propagation, so only base tables need real profiling. See static-semantic-type-propagation-through-sql-queries-via-sqlglot.md.
  • Table key format: dbt uses unique_id as resource_type.project.model_name (e.g., model.dataface_examples.stg_products). For inspect.json keys, follow dbt's convention in dbt projects. For non-dbt, use source.schema.table where source comes from dataface.yml. This prevents collisions when a project has multiple databases/schemas.

Possible Solutions

dft inspect (bare) becomes: 1. Discover all tables from the connected source 2. Profile each table (stats, grain, semantic types, quality flags) 3. Merge dbt schema.yml descriptions (if dbt project) or dataface.yml descriptions 4. Run cross-table relationship detection + fanout risk scoring 5. Save everything to target/inspect.json as one self-contained artifact 6. Print summary with audit-style readiness scores

dft inspect table X (single table) should also update relationships — load existing profiles from inspect.json, add the new one, re-run relationship detection across all, save back.

  • Pros: Single command, self-contained artifact, MCP tools become pure readers
  • Cons: Slower for large databases — mitigate with --approximate auto and --include/--exclude filters

B. Keep relationships and descriptions as on-the-fly layers

  • Pros: No build step
  • Cons: Redundant work on every call, no single source of truth, relationships never persisted

Plan

Approach A.

Phase 1: Bake relationships into inspect.json

  1. After profiling a table via dft inspect table X, load all cached profiles from inspect.json
  2. Run detect_relationships() + enrich_relationships() + score_fanout_risk() across all cached profiles
  3. Update the relationships field on all affected TableInspection entries
  4. Save back to inspect.json

Phase 2: Bake dbt descriptions into inspect.json

  1. During profiling, run DbtSchemaParser and merge_dbt_descriptions()
  2. Store merged descriptions in the profile (use description_candidates with provenance)
  3. catalog() reads descriptions from inspect.json instead of re-parsing

Phase 3: Batch profiling (dft inspect bare)

  1. Make table_name optional in inspect_command() CLI
  2. When omitted, discover all tables via schema introspection
  3. Profile each table with progress indicator
  4. Run relationship detection across all at the end
  5. Add --include/--exclude glob filters for table name patterns
  6. Default to --approximate auto for batch
  7. Print summary: N tables profiled, description coverage %, relationship coverage, warnings

Phase 4: Simplify MCP catalog()

  1. catalog() reads from inspect.json as primary source — no live DB needed for profiled tables
  2. Fall back to live introspection only for tables NOT in inspect.json
  3. Remove on-the-fly DbtSchemaParser calls and _detect_catalog_relationships() re-computation

Files to modify: - dataface/cli/commands/inspect.py — batch mode, relationship/dbt bake-in - dataface/core/inspect/inspector.py — populate relationships after profiling - dataface/core/inspect/storage.py — bulk save/update helpers - dataface/ai/mcp/tools.py — simplify catalog() to read from inspect.json - dataface/ai/mcp/catalog_enrichment.py — may become unnecessary

Implementation Progress

Review Feedback

  • [ ] Review cleared