Ingest dbt schema.yml descriptions into AI_CONTEXT
Problem
dbt projects contain rich human-authored descriptions for models and columns in schema.yml files, but the AI_CONTEXT pipeline does not ingest them. Pilot analysts using Dataface against dbt-managed warehouses get metadata that ignores the semantic documentation their dbt teams have already written. This is the highest-value description source for dbt-native users and its absence significantly reduces context quality for the pilot's primary audience.
Context
- dbt model and column descriptions are ingested and linked to AI_CONTEXT entities.
- Conflicts with inferred descriptions are resolved via documented precedence rules.
- Pilot agents can retrieve dbt-authored semantics through MCP context outputs.
Possible Solutions
Plan
- Implement dbt schema parser/mapper for AI_CONTEXT entity IDs.
- Handle missing/renamed model mappings with explicit warnings.
- Add tests for merge behavior and precedence with profile-derived metadata.
- Document ingestion assumptions and required dbt project structure.
Implementation Progress
- Implemented
dataface/core/inspect/dbt_schema.py(DbtSchemaParser + merge_dbt_descriptions), wired into MCP catalog via_enrich_with_dbthelper - 19 tests in
tests/core/test_dbt_schema.pycovering parser, mapping, and MCP integration - Two cbox review rounds: R1 fixed dead code, swallowed warnings, not-profiled path skip, added cached-path test; R2 fixed dead logging import and spurious single-table warnings
- 77 tests pass (19 new + 58 existing), no regressions
- Docs checklist item deferred — inline docstrings cover parser/warning semantics
- Description priority merge engine landed in PR #500 (consumer of these sources)
Review Feedback
- Two cbox review rounds completed, all blocking issues resolved
-
Review verdict: APPROVED
-
[x] Review cleared