Dataface Tasks

Refactor TableInspector inspection pipeline for maintainability

IDINSPECT_PROFILER-REFACTOR_TABLEINSPECTOR_INSPECTION_PIPELINE_FOR_MAINTAINABILITY
Statuscompleted
Priorityp2
Milestonem1-ft-analytics-analyst-pilot
Ownersr-engineer-architect

Problem

Reduce the complexity of TableInspector._inspect_table_inner by extracting private helpers while preserving profiler behavior, output shape, and instrumentation.

Context

  • The current hotspot is dataface/core/inspect/inspector.py, specifically TableInspector._inspect_table_inner().
  • A local complexity scan put _inspect_table_inner() at the top of the core package by both branch count and length. It currently owns schema lookup, dialect enrichment, approximate/exact stats collection, sample fetching, per-column profiling, optional enum/histogram/date-distribution queries, fallback semantic typing, lifecycle detection, grain detection, and final instrumentation.
  • This refactor needs to preserve the existing profiler contract and output shape because the inspector is consumed by tests, the server, and the VS Code extension.
  • Relevant coverage already exists in:
  • tests/core/test_inspect_adapters.py
  • tests/core/test_inspect_distributions.py
  • tests/core/test_inspect_instrumentation.py
  • Constraints:
  • Keep changes surgical inside dataface/core/inspect/inspector.py unless tests need minor additions.
  • Preserve inspect_table() behavior, query-count accounting, enrichment behavior, and event emission.
  • Avoid changing the task scaffold or frontmatter manually outside the worksheet body.

Possible Solutions

  • Extract a small helper object or new module to own the full inspection workflow.
  • Pros: strongest separation of concerns.
  • Cons: larger surface-area change, more file movement, higher regression risk for a maintainability-only task.
  • Extract focused private helper methods inside TableInspector for each pipeline stage while keeping the public API and file structure intact. Recommended
  • Pros: reduces cognitive load in _inspect_table_inner(), keeps behavior local, fits repo guidance to avoid splitting files unless needed, and is easy to regression test.
  • Cons: still leaves the module large overall, so the improvement is mostly in orchestration clarity rather than total file size.
  • Leave the code as-is and only add comments/tests.
  • Pros: minimal risk.
  • Cons: does not solve the maintainability problem that prompted the task.

Plan

  • Refactor TableInspector._inspect_table_inner() into a short orchestration method that delegates to private helpers.
  • Extract helpers for:
  • profiling stats/query tracking setup
  • approximate-vs-exact stats collection
  • sample row loading
  • per-column ColumnInspection construction
  • optional enum/histogram/date-distribution enrichment
  • final TableInspection assembly and instrumentation
  • Keep the implementation in dataface/core/inspect/inspector.py.
  • Add targeted tests for the refactor seams where behavior is easiest to accidentally change, especially profiling stats and approximate-path bookkeeping.
  • Run focused pytest coverage for inspector adapters, distributions, and instrumentation before marking the task complete.

Implementation Progress

  • 2026-03-13: Reviewed repo instructions and existing inspector tests. Chose inspect-profiler workstream and created this task via the CLI.
  • 2026-03-13: Identified TableInspector._inspect_table_inner() as the highest-value refactor target based on complexity and existing test coverage.
  • 2026-03-13: Refactored dataface/core/inspect/inspector.py so _inspect_table_inner() now delegates to private helpers for context setup, stats collection, sample loading, per-column profile construction, optional enrichments, lifecycle post-processing, and final inspection assembly.
  • 2026-03-13: Preserved the existing profiler contract and profile_completed instrumentation flow while keeping the implementation in the same file.
  • 2026-03-13: Added a targeted DuckDB integration test in tests/core/test_inspect_adapters.py to lock down categorical top_values and enum_values behavior.
  • 2026-03-13: Verification:
  • uv run pytest tests/core/test_inspect_adapters.py tests/core/test_inspect_distributions.py tests/core/test_inspect_instrumentation.py -> 143 passed, 31 skipped
  • git diff --check -> clean

Review Feedback

  • 2026-03-13: Final self-review of the extracted helper flow and focused inspector test run found no regressions in the covered paths. No follow-up changes were required after review.

  • [x] Review cleared