inspect profiler
Purpose
Database profiling, semantic typing, and context generation for analysts and AI. The inspect module connects to a user's warehouse, profiles tables/columns, detects semantic types (currency, email, timestamp, etc.), classifies data quality, and produces structured context that feeds into dashboard generation and MCP tools. This is the "understand the data" layer — it turns raw schema into rich metadata that analysts and AI agents use to ask better questions and build better dashboards. Adjacent to context-catalog-nimble (which defines the context architecture and Nimble methodology) and mcp-analyst-agent (which consumes inspect output as tool context).
Owner
- Sr Engineer Architect
Initiatives
- Inspector cleanup and open-source hardening — In Progress, M1 — 5T Internal Pilot Ready, 1 / 1 tasks complete (100%)
Tasks by Milestone
A runnable prototype path exists for warehouse profiling, semantic inference, and analyst-facing data context surfaces, with concrete artifacts that prove the flow works end-to-end in the current codebase. Core assumptions are documented, known constraints are explicit, and the team can explain what is real versus mocked without ambiguity.
- Prototype gaps and follow-on capture Completed — Document top gaps and risks in analyst-facing inspector experience that must be addressed next.
- Prototype implementation path Completed — Implement a runnable end-to-end prototype path for profiling pipeline.
- Prototype validation and proof Completed — Validate semantic inference and context quality with concrete proof artifacts and repeatable steps.
Internal analysts can execute at least one weekly real workflow that depends on warehouse profiling, semantic inference, and analyst-facing data context surfaces in the 5T Analytics environment, without bespoke engineering intervention for every run. Instrumentation and feedback capture are in place so failures, friction points, and adoption gaps are visible and triaged with owners.
- Enable profiler drill-in/out links across table, schema, and column dashboards Complete — Add working navigation links so analysts can move from table profiles into schema/column dashboards and back out withou…
- Profiler payload and UX contract ready for extension consumption Completed — Stabilize inspect/profiler output contract and UX assumptions so extension embedding is reliable for M1 pilot.
- Add histogram bins and date distributions to profiler Completed — Add profiler histogram bins and date distributions so analysts can understand value spread and temporal density at a gl…
- Add spark bar chart type for profiler column cards Completed — Add spark bar chart support for profiler column cards to improve compact distribution and completeness scanning.
- IDE inspector: use cached inspect.json before querying database Completed — All profiler surfaces should read from inspect.json as single source of truth. Never auto-profile — prompt user on cach…
- Inspector cleanup wave 1 architectural decomposition and contract hardening Inspector cleanup and open-source hardening CompletedPR #722PR at 2026-03-22T23:23:53-07:00 — Plan and execute a deeper inspector cleanup pass that decomposes oversized inspector modules, tightens internal APIs, i…
- Inspector: fetch and display database column comments Completed — Fetch and display database column comments in inspector so semantic context from warehouses is visible during analysis.
- Eliminate all custom HTML - dataface YAML everywhere Completed — Replace all hand-crafted HTML across the extension and server with dataface YAML rendered through the normal compile/re…
- Refactor TableInspector inspection pipeline for maintainability Completed — Reduce the complexity of TableInspector._inspect_table_inner by extracting private helpers while preserving profiler be…
warehouse profiling, semantic inference, and analyst-facing data context surfaces is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.
- Adoption hardening for internal teams — Harden profiling pipeline for repeated use across multiple internal teams and first design partners.
- Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for semantic inference and context quality with explicit decision logs.
- Increase profiler semantic coverage — Improve semantic typing and profile output quality used by analysts and agent workflows.
- Inspector template customization with eject command Completed — Provide an inspector template eject workflow so teams can customize profile UI safely while retaining upgrade paths.
- Quality standards and guardrails — Define and enforce quality standards for analyst-facing inspector experience to keep output consistent as contributors…
- Float confidence scores for statistical column characteristics — Replace binary boolean flags in ColumnInspection with float confidence scores 0.0-1.0 for statistical properties: is_se…
- Surface join multiplicity in AI schema context and clarify FK cardinality contract — Relationship edges baked into inspect.json already carry deterministic join cardinality: join_profile.multiplicity clas…
Launch scope for warehouse profiling, semantic inference, and analyst-facing data context surfaces is complete, externally explainable, and supportable: user-facing behavior is stable, documentation is publishable, and operational ownership is explicit. Remaining gaps are non-blocking, risk-assessed, and tracked as post-launch follow-up rather than unresolved launch debt.
- Launch docs and external readiness — Publish external-facing documentation and examples for semantic inference and context quality that are executable by ne…
- Launch operations and reliability readiness — Finalize operational readiness for analyst-facing inspector experience: telemetry, alerting, support ownership, and inc…
- Public launch scope completion — Complete launch-critical scope for profiling pipeline with production-safe behavior and rollback clarity.
- feat: chart decisions Phase 3 — inspector profile integration — Integrate chart-decision outputs into inspector profiles so profile insights can directly influence recommended visuali…
Post-launch stabilization is complete for warehouse profiling, semantic inference, and analyst-facing data context surfaces: recurring incidents are reduced, support burden is lower, and quality gates are enforced consistently before release. The team has a repeatable operating model for maintenance, regression prevention, and measured reliability improvements.
- Regression prevention and quality gates — Add or enforce regression gates around semantic inference and context quality so release quality is sustained automatic…
- Sustainable operating model — Document and adopt sustainable operating model for analyst-facing inspector experience across support, triage, and rele…
- v1.0 stability and defect burn-down — Run stability program for profiling pipeline with recurring defect burn-down and reliability trend tracking.
v1.2 delivers meaningful depth improvements in warehouse profiling, semantic inference, and analyst-facing data context surfaces based on observed usage and retention signals, not just roadmap intent. Enhancements improve real customer outcomes, and release readiness is demonstrated through metrics, regression coverage, and clear migration guidance where relevant.
- Quality and performance improvements — Ship measurable quality/performance improvements in semantic inference and context quality tied to user-facing outcomes.
- v1.2 depth expansion — Deliver depth expansion in profiling pipeline prioritized by observed usage and retention outcomes.
- v1.2 release and migration readiness — Prepare v1.2 release/migration readiness for analyst-facing inspector experience, including communication and upgrade g…
Long-horizon opportunities for warehouse profiling, semantic inference, and analyst-facing data context surfaces are captured as concrete hypotheses with user impact, prerequisites, and evaluation criteria. Ideas are ranked by strategic value and feasibility so future investment decisions can be made quickly with less rediscovery.
- Experiment design for future bets — Design validation experiments for analyst-facing inspector experience so future bets can be tested before major investm…
- Future opportunity research — Capture long-horizon opportunities for profiling pipeline with user impact and strategic fit.
- Prerequisite and dependency mapping — Map enabling prerequisites and dependencies for semantic inference and context quality to reduce future startup cost.