Dataface Tasks

Description priority merge in MCP context output

IDM1-AICONTEXT-004
Statuscompleted
Priorityp0
Milestonem1-ft-analytics-analyst-pilot
Ownerdata-ai-engineer-architect
Initiativedescription-enrichment
Completed bydave
Completed2026-03-22

Problem

Descriptions for tables and columns can come from multiple sources — database comments, dbt schema.yml, AI-generated summaries, source code docstrings — but the MCP context output has no deterministic merge strategy to select the best-available description for each entity. Without priority-based merging, consumers may receive stale, low-quality, or inconsistent descriptions depending on ingestion order, and adding new description sources risks overwriting better existing ones. AI agents need stable, predictable semantics to generate reliable queries.

Context

  • MCP context output follows a documented priority stack for descriptions.
  • Provenance is retained so users can inspect source of merged descriptions.
  • Output is stable across repeated runs on unchanged inputs.

Possible Solutions

Plan

  • Implement merge policy in schema context formatter and related tool paths.
  • Add provenance fields and debugging hooks for selected description source.
  • Create tests for deterministic merge behavior across source combinations.
  • Update docs with practical examples of merge outcomes.

Implementation Progress

  • Merge engine implemented in dataface/ai/description_merge.py with priority stack: dbt_schema_yml > database_comment > curated > inferred
  • Tie-break rules: highest priority → confidence → recency → text sort
  • Wired into format_schema_context via format_table_context — table and column descriptions now appear in MCP context output
  • Full provenance metadata preserved (all candidates, selected source, selection reason, fallback warnings)
  • 23 unit tests for merge engine + 10 integration tests for schema context
  • Remaining: documentation with practical merge outcome examples
  • PR: https://github.com/fivetran/dataface/pull/500

Review Feedback

  • cbox review feedback applied: source name aligned to dbt_schema_yml, duplication eliminated via from_dict classmethod, column description truncation, _invert_iso_string safety, dead code wired into production path, false source attribution fixed
  • Review verdict: APPROVED

  • [x] Review cleared