Description priority merge in MCP context output

ID	M1-AICONTEXT-004
Status	completed
Priority	p0
Milestone	m1-ft-analytics-analyst-pilot
Owner	data-ai-engineer-architect
Initiative	description-enrichment
Completed by	dave
Completed	2026-03-22

Problem

Descriptions for tables and columns can come from multiple sources — database comments, dbt schema.yml, AI-generated summaries, source code docstrings — but the MCP context output has no deterministic merge strategy to select the best-available description for each entity. Without priority-based merging, consumers may receive stale, low-quality, or inconsistent descriptions depending on ingestion order, and adding new description sources risks overwriting better existing ones. AI agents need stable, predictable semantics to generate reliable queries.

Context

MCP context output follows a documented priority stack for descriptions.
Provenance is retained so users can inspect source of merged descriptions.
Output is stable across repeated runs on unchanged inputs.

Possible Solutions

Plan

Implement merge policy in schema context formatter and related tool paths.
Add provenance fields and debugging hooks for selected description source.
Create tests for deterministic merge behavior across source combinations.
Update docs with practical examples of merge outcomes.

Implementation Progress

Merge engine implemented in dataface/ai/description_merge.py with priority stack: dbt_schema_yml > database_comment > curated > inferred
Tie-break rules: highest priority → confidence → recency → text sort
Wired into format_schema_context via format_table_context — table and column descriptions now appear in MCP context output
Full provenance metadata preserved (all candidates, selected source, selection reason, fallback warnings)
23 unit tests for merge engine + 10 integration tests for schema context
Remaining: documentation with practical merge outcome examples
PR: https://github.com/fivetran/dataface/pull/500

Review Feedback

cbox review feedback applied: source name aligned to dbt_schema_yml, duplication eliminated via from_dict classmethod, column description truncation, _invert_iso_string safety, dead code wired into production path, false source attribution fixed
Review verdict: APPROVED
[x] Review cleared