Refactor TableInspector inspection pipeline for maintainability
Problem
Reduce the complexity of TableInspector._inspect_table_inner by extracting private helpers while preserving profiler behavior, output shape, and instrumentation.
Context
- The current hotspot is
dataface/core/inspect/inspector.py, specificallyTableInspector._inspect_table_inner(). - A local complexity scan put
_inspect_table_inner()at the top of the core package by both branch count and length. It currently owns schema lookup, dialect enrichment, approximate/exact stats collection, sample fetching, per-column profiling, optional enum/histogram/date-distribution queries, fallback semantic typing, lifecycle detection, grain detection, and final instrumentation. - This refactor needs to preserve the existing profiler contract and output shape because the inspector is consumed by tests, the server, and the VS Code extension.
- Relevant coverage already exists in:
tests/core/test_inspect_adapters.pytests/core/test_inspect_distributions.pytests/core/test_inspect_instrumentation.py- Constraints:
- Keep changes surgical inside
dataface/core/inspect/inspector.pyunless tests need minor additions. - Preserve
inspect_table()behavior, query-count accounting, enrichment behavior, and event emission. - Avoid changing the task scaffold or frontmatter manually outside the worksheet body.
Possible Solutions
- Extract a small helper object or new module to own the full inspection workflow.
- Pros: strongest separation of concerns.
- Cons: larger surface-area change, more file movement, higher regression risk for a maintainability-only task.
- Extract focused private helper methods inside
TableInspectorfor each pipeline stage while keeping the public API and file structure intact. Recommended - Pros: reduces cognitive load in
_inspect_table_inner(), keeps behavior local, fits repo guidance to avoid splitting files unless needed, and is easy to regression test. - Cons: still leaves the module large overall, so the improvement is mostly in orchestration clarity rather than total file size.
- Leave the code as-is and only add comments/tests.
- Pros: minimal risk.
- Cons: does not solve the maintainability problem that prompted the task.
Plan
- Refactor
TableInspector._inspect_table_inner()into a short orchestration method that delegates to private helpers. - Extract helpers for:
- profiling stats/query tracking setup
- approximate-vs-exact stats collection
- sample row loading
- per-column
ColumnInspectionconstruction - optional enum/histogram/date-distribution enrichment
- final
TableInspectionassembly and instrumentation - Keep the implementation in
dataface/core/inspect/inspector.py. - Add targeted tests for the refactor seams where behavior is easiest to accidentally change, especially profiling stats and approximate-path bookkeeping.
- Run focused pytest coverage for inspector adapters, distributions, and instrumentation before marking the task complete.
Implementation Progress
- 2026-03-13: Reviewed repo instructions and existing inspector tests. Chose
inspect-profilerworkstream and created this task via the CLI. - 2026-03-13: Identified
TableInspector._inspect_table_inner()as the highest-value refactor target based on complexity and existing test coverage. - 2026-03-13: Refactored
dataface/core/inspect/inspector.pyso_inspect_table_inner()now delegates to private helpers for context setup, stats collection, sample loading, per-column profile construction, optional enrichments, lifecycle post-processing, and final inspection assembly. - 2026-03-13: Preserved the existing profiler contract and
profile_completedinstrumentation flow while keeping the implementation in the same file. - 2026-03-13: Added a targeted DuckDB integration test in
tests/core/test_inspect_adapters.pyto lock down categoricaltop_valuesandenum_valuesbehavior. - 2026-03-13: Verification:
uv run pytest tests/core/test_inspect_adapters.py tests/core/test_inspect_distributions.py tests/core/test_inspect_instrumentation.py->143 passed, 31 skippedgit diff --check-> clean
Review Feedback
-
2026-03-13: Final self-review of the extracted helper flow and focused inspector test run found no regressions in the covered paths. No follow-up changes were required after review.
-
[x] Review cleared