Dataface Tasks

Surface join multiplicity in AI schema context and clarify FK cardinality contract

IDINSPECT_PROFILER-SURFACE_JOIN_MULTIPLICITY_IN_AI_SCHEMA_CONTEXT_AND_CLARIFY_FK_CARDINALITY_CONTRACT
Statusnot_started
Priorityp2
Milestonem2-internal-adoption-design-partners
Ownersr-engineer-architect

Problem

Downstream consumers assume “the profiler doesn’t know” if a join is 1:N vs N:1, but the artifact often does—it just does not show up in the default AI schema narrative. That forces agents to guess join safety from column names alone. Separately, the same edge carries two multiplicity fields (RelationshipEdge.multiplicity vs join_profile.multiplicity) with different derivations, which invites silent misuse.

Context

  • Persisted edges (InspectionStorage.update_relationships): each outgoing relationship dict includes join_profile with multiplicity (classify_multiplicity on both join columns’ uniqueness_ratio) and fanout_factor; fanout_risk is scored from that. See dataface/core/inspect/join_multiplicity.py, dataface/core/inspect/storage.py.
  • Edge-level multiplicity on RelationshipEdge is currently inferred from the FK column only (_infer_multiplicityone-to-one vs many-to-one only). Full four-way labels live under join_profile.
  • MCP / catalog can return relationships on tables and a catalog-level list; multiplicity is present when profiles were baked.
  • Gap: dataface/ai/schema_context.pyformat_schema_context / format_table_summary output no relationship lines—only headers and columns—so get_schema_context() does not expose FK cardinality to prompts.
  • Docs: docs/docs/inspector/context-stack.md (Layer 4) already mentions join multiplicity and fanout; keep in sync with any contract change.

Possible Solutions

  1. Document only — Declare join_profile.multiplicity canonical; deprecate or redefine edge["multiplicity"] in CONTRACT.md. Pros: cheap. Cons: agents still miss it in schema text.

  2. Append a compact “Relationships” section to format_schema_context — After all tables, dedupe edges and print left_table.left_column → right_table.right_column + multiplicity (+ optional fanout_risk level). Pros: one prompt read. Cons: token cost; need caps for large catalogs.

  3. Per-table subsection — Under each ### table, list only that table’s outgoing relationships from the cached profile. Pros: locality; reuses baked per-table arrays. Cons: longer overall text when tables are many.

Recommended: (3) for default path with a global cap (max edges or max tables with rel sections) and (2) as optional flag later. (1) in the same change: contract + comment that join_profile.multiplicity is authoritative for cardinality.

Plan

  1. Update dataface/core/inspect/CONTRACT.md (and/or inspector_schema.md) with canonical multiplicity rules and the FK→PK direction convention for interpreting N:1 vs 1:N.
  2. Extend format_schema_context (or helper) to merge relationships from cached table dicts; format multiplicity from join_profile when present, else omit or fall back with a warning in logs.
  3. Add unit tests on formatted output for a minimal two-table fixture with a known many-to-one edge.
  4. Re-run / extend any MCP or snapshot tests if schema context is asserted anywhere.
  5. Trim docs/docs/inspector/context-stack.md only if behavior diverges from what’s written.

Implementation Progress

QA Exploration

N/A — formatter and contract change; validate via unit tests and spot-check MCP/schema context consumers.

  • [x] QA exploration completed (or N/A for non-UI tasks)

Review Feedback

  • [ ] Review cleared