context catalog nimble
Purpose
Context architecture/catalog and Nimble integration into inspect, MCP, and generation flows. This workstream defines how data context is structured, stored, and surfaced across the product — the "catalog" of what we know about a user's data and how it's made available to humans and AI. Nimble is the methodology for lightweight dbt model linting and context enrichment. This workstream owns the context schema, the Nimble rule engine, and the integration points where context flows into inspect (profiler output), MCP (agent tools), and dashboard generation (smarter defaults). Adjacent to inspect-profiler (which produces raw context) and mcp-analyst-agent (which consumes context for AI workflows).
Owner
- Data AI Engineer Architect
Initiatives
- Profiling Foundation Layers 1-5 — Completed, M0 — Prototype, 2 / 2 tasks complete (100%)
- Description Enrichment Pipeline — Planned, M1 — 5T Internal Pilot Ready, 2 / 2 tasks complete (100%)
- Grain Inference and Fanout Risk — Ready For Eng, M1 — 5T Internal Pilot Ready, 2 / 2 tasks complete (100%)
- Layer 6 Relationship Mapping — Planned, M1 — 5T Internal Pilot Ready, 1 / 2 tasks complete (50%)
- MCP Catalog and Agent Tools — In Progress, M1 — 5T Internal Pilot Ready, 4 / 4 tasks complete (100%)
- Question-aware schema retrieval and narrowing — Planned, M2 — Internal Adoption + Design Partners, 0 / 4 tasks complete (0%)
- External Context Sources — Planned, MX — Far Future, 0 / 0 tasks complete (0%)
Tasks by Milestone
A runnable prototype path exists for context schema/catalog contracts and Nimble enrichment flows across product surfaces, with concrete artifacts that prove the flow works end-to-end in the current codebase. Core assumptions are documented, known constraints are explicit, and the team can explain what is real versus mocked without ambiguity.
- AI_CONTEXT core MCP tools built MCP Catalog and Agent Tools Completed — Completed core MCP tool set (catalog, execute_query, render_dashboard, list_sources) for AI context consumption and act…
- AI_CONTEXT profiling layers 1-5 foundation built Profiling Foundation Layers 1-5 Completed — Completed baseline profiling foundation (schema, enrichment, stats, samples, semantic/quality inference) used by AI_CON…
- AI_CONTEXT schema context formatter and MCP resources built MCP Catalog and Agent Tools Completed — Completed token-efficient schema context formatter and MCP resources that expose pre-built AI context to agents.
- AI_CONTEXT table description ingestion built Profiling Foundation Layers 1-5 Completed — Completed ingestion of database table descriptions into profiling output as baseline semantic enrichment.
- Prototype gaps and follow-on capture Completed — Document top gaps and risks in cross-surface context contract that must be addressed next.
- Prototype implementation path Completed — Implement a runnable end-to-end prototype path for context schema model.
- Prototype validation and proof Completed — Validate context enrichment rules with concrete proof artifacts and repeatable steps.
Internal analysts can execute at least one weekly real workflow that depends on context schema/catalog contracts and Nimble enrichment flows across product surfaces in the 5T Analytics environment, without bespoke engineering intervention for every run. Instrumentation and feedback capture are in place so failures, friction points, and adoption gaps are visible and triaged with owners.
- AI_CONTEXT beta health and readiness scorecard Completed — Define and track AI_CONTEXT beta health metrics so M1 go/no-go is based on coverage, quality, and analyst usability sig…
- AI_CONTEXT layer 6 relationship mapping for pilot datasets Layer 6 Relationship Mapping Completed — Implement cross-table relationship mapping in AI_CONTEXT so join graph context is available to agents during M1 workflo…
- AI_CONTEXT metadata contract v1 for pilot MCP Catalog and Agent Tools Done — Solidify AI_CONTEXT data format into a versioned contract with clear field semantics and compatibility rules for beta u…
- Description priority merge in MCP context output Description Enrichment Pipeline Completed — Implement deterministic description-source merging in MCP context output so AI tools receive stable best-available sema…
- Ingest dbt schema.yml descriptions into AI_CONTEXT Description Enrichment Pipeline Done — Merge dbt model and column descriptions into AI_CONTEXT so human-authored semantics are available during pilot analysis.
- AI_CONTEXT grain and fanout risk signals (beta subset) Grain Inference and Fanout Risk Completed — Ship grain candidate, join multiplicity, and fanout risk metadata in AI_CONTEXT to reduce unsafe aggregate query genera…
- dft inspect native CSV support via ephemeral DuckDB Completed — dft inspect cannot profile CSV sources today because the inspector only supports SQL databases. It should handle CSVs n…
- dft inspect: build complete self-contained catalog in target/inspect.json Completed — dft inspect should be the single command that builds a complete, self-contained catalog artifact in target/inspect.json…
- Incremental dft inspect with lineage-aware change detection Completed — dft inspect should skip re-profiling tables whose source data and upstream lineage have not changed since the last insp…
- Move playground examples to DuckDB and ship pre-built inspect.json Completed — Playground examples currently use raw CSV files via CsvAdapter with Python stdlib csv.DictReader - no SQL, no joins, no…
- search_dashboards MCP tool for pilot context workflows MCP Catalog and Agent Tools Completed — Add search_dashboards MCP tool so pilots can discover relevant existing dashboards and reuse validated query patterns.
- Research deterministic column fanout risk signals and AI context surfacing Grain Inference and Fanout Risk Completed — Synthesize how fanout risk maps to columns versus edges deterministic profiling and dbt signals and options to surface…
context schema/catalog contracts and Nimble enrichment flows across product surfaces is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.
- Adoption hardening for internal teams — Harden context schema model for repeated use across multiple internal teams and first design partners.
- Build question-aware schema search and isolation CLI over inspect.json and dbt metadata Question-aware schema retrieval and narrowing — Build a local file-and-CLI retrieval layer that composes inspect.json, dbt schema metadata, and lightweight docs into a…
- Compare text-to-SQL evals with question-aware retrieval vs full-context prompting Question-aware schema retrieval and narrowing Waiting on build-question-aware-schema-search-and-isolation-cli-over-inspect-json-and-dbt-metadata, wire-question-scoped-context-bundles-into-text-to-sql-eval-backends — After the retrieval CLI and bundle integration land, run paired local evals comparing question-aware retrieval-and-isol…
- Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for context enrichment rules with explicit decision logs.
- feat: chart decisions Phase 4 — SQLGlot column lineage Layer 6 Relationship Mapping — Implement SQLGlot column lineage integration to enrich chart decisions with column-level dependency context.
- Quality standards and guardrails — Define and enforce quality standards for cross-surface context contract to keep output consistent as contributors expan…
- Stabilize context catalog schema v1 — Finalize context schema contracts and integration points across inspect, generation, and agent flows.
- Static semantic type propagation through SQL queries via SQLGlot — Use SQLGlot Expression.meta to propagate profiler-detected semantic types like CURRENCY, EMAIL, CREATED_AT through SQL…
- Wire question-scoped context bundles into text-to-SQL eval backends Question-aware schema retrieval and narrowing — Teach the shared SQL generation and eval backend layer to consume question-scoped context bundles from the retrieval CL…
- Iterate on question-aware retrieval with interface and result experiments Question-aware schema retrieval and narrowing Waiting on compare-text-to-sql-evals-with-question-aware-retrieval-vs-full-context-prompting — After the initial A/B eval comparing retrieval versus full-context prompting, run a small set of follow-up experiments…
Launch scope for context schema/catalog contracts and Nimble enrichment flows across product surfaces is complete, externally explainable, and supportable: user-facing behavior is stable, documentation is publishable, and operational ownership is explicit. Remaining gaps are non-blocking, risk-assessed, and tracked as post-launch follow-up rather than unresolved launch debt.
- Launch docs and external readiness — Publish external-facing documentation and examples for context enrichment rules that are executable by new users.
- Launch operations and reliability readiness — Finalize operational readiness for cross-surface context contract: telemetry, alerting, support ownership, and incident…
- Public launch scope completion — Complete launch-critical scope for context schema model with production-safe behavior and rollback clarity.
- Add join-path grounding to question-scoped context bundles Waiting on build-question-aware-schema-search-and-isolation-cli-over-inspect-json-and-dbt-metadata — Teach bundles to surface likely join paths and key relationships between retained tables so the SQL generator sees how…
- Build lightweight value hints retrieval from inspect artifacts Waiting on build-question-aware-schema-search-and-isolation-cli-over-inspect-json-and-dbt-metadata — Expose cheap static filter-disambiguation hints such as enum-like values date ranges and high-signal categorical member…
- QUERY_VALIDATOR foundation and first integrations — Build the first query validator path using SQLGlot plus schema profile grain and relationship context for query review…
Post-launch stabilization is complete for context schema/catalog contracts and Nimble enrichment flows across product surfaces: recurring incidents are reduced, support burden is lower, and quality gates are enforced consistently before release. The team has a repeatable operating model for maintenance, regression prevention, and measured reliability improvements.
- Regression prevention and quality gates — Add or enforce regression gates around context enrichment rules so release quality is sustained automatically.
- Sustainable operating model — Document and adopt sustainable operating model for cross-surface context contract across support, triage, and release c…
- v1.0 stability and defect burn-down — Run stability program for context schema model with recurring defect burn-down and reliability trend tracking.
- Evaluate question-scoped bundle compression strategies Waiting on compare-text-to-sql-evals-with-question-aware-retrieval-vs-full-context-prompting — Compare raw narrowed dumps against more structured or explanation-rich bundle shapes so the team can pick the smallest…
v1.2 delivers meaningful depth improvements in context schema/catalog contracts and Nimble enrichment flows across product surfaces based on observed usage and retention signals, not just roadmap intent. Enhancements improve real customer outcomes, and release readiness is demonstrated through metrics, regression coverage, and clear migration guidance where relevant.
- Quality and performance improvements — Ship measurable quality/performance improvements in context enrichment rules tied to user-facing outcomes.
- v1.2 depth expansion — Deliver depth expansion in context schema model prioritized by observed usage and retention outcomes.
- v1.2 release and migration readiness — Prepare v1.2 release/migration readiness for cross-surface context contract, including communication and upgrade guidan…
- evaluate snowflake semantic views and lineage as context sources — Investigate whether Snowflake semantic views, Cortex Analyst instructions, and GET_LINEAGE metadata should be ingested…
Long-horizon opportunities for context schema/catalog contracts and Nimble enrichment flows across product surfaces are captured as concrete hypotheses with user impact, prerequisites, and evaluation criteria. Ideas are ranked by strategic value and feasibility so future investment decisions can be made quickly with less rediscovery.
- Experiment design for future bets — Design validation experiments for cross-surface context contract so future bets can be tested before major investment.
- Future opportunity research — Capture long-horizon opportunities for context schema model with user impact and strategic fit.
- Prerequisite and dependency mapping — Map enabling prerequisites and dependencies for context enrichment rules to reduce future startup cost.