Surface join multiplicity in AI schema context and clarify FK cardinality contract
Problem
Downstream consumers assume “the profiler doesn’t know” if a join is 1:N vs N:1, but the artifact often does—it just does not show up in the default AI schema narrative. That forces agents to guess join safety from column names alone. Separately, the same edge carries two multiplicity fields (RelationshipEdge.multiplicity vs join_profile.multiplicity) with different derivations, which invites silent misuse.
Context
- Persisted edges (
InspectionStorage.update_relationships): each outgoing relationship dict includesjoin_profilewithmultiplicity(classify_multiplicityon both join columns’uniqueness_ratio) andfanout_factor;fanout_riskis scored from that. Seedataface/core/inspect/join_multiplicity.py,dataface/core/inspect/storage.py. - Edge-level
multiplicityonRelationshipEdgeis currently inferred from the FK column only (_infer_multiplicity→one-to-onevsmany-to-oneonly). Full four-way labels live underjoin_profile. - MCP / catalog can return
relationshipson tables and a catalog-level list; multiplicity is present when profiles were baked. - Gap:
dataface/ai/schema_context.py→format_schema_context/format_table_summaryoutput no relationship lines—only headers and columns—soget_schema_context()does not expose FK cardinality to prompts. - Docs:
docs/docs/inspector/context-stack.md(Layer 4) already mentions join multiplicity and fanout; keep in sync with any contract change.
Possible Solutions
-
Document only — Declare
join_profile.multiplicitycanonical; deprecate or redefineedge["multiplicity"]in CONTRACT.md. Pros: cheap. Cons: agents still miss it in schema text. -
Append a compact “Relationships” section to
format_schema_context— After all tables, dedupe edges and printleft_table.left_column → right_table.right_column+ multiplicity (+ optional fanout_risk level). Pros: one prompt read. Cons: token cost; need caps for large catalogs. -
Per-table subsection — Under each
### table, list only that table’s outgoingrelationshipsfrom the cached profile. Pros: locality; reuses baked per-table arrays. Cons: longer overall text when tables are many.
Recommended: (3) for default path with a global cap (max edges or max tables with rel sections) and (2) as optional flag later. (1) in the same change: contract + comment that join_profile.multiplicity is authoritative for cardinality.
Plan
- Update
dataface/core/inspect/CONTRACT.md(and/orinspector_schema.md) with canonical multiplicity rules and the FK→PK direction convention for interpreting N:1 vs 1:N. - Extend
format_schema_context(or helper) to merge relationships from cached table dicts; format multiplicity fromjoin_profilewhen present, else omit or fall back with a warning in logs. - Add unit tests on formatted output for a minimal two-table fixture with a known
many-to-oneedge. - Re-run / extend any MCP or snapshot tests if schema context is asserted anywhere.
- Trim
docs/docs/inspector/context-stack.mdonly if behavior diverges from what’s written.
Implementation Progress
QA Exploration
N/A — formatter and contract change; validate via unit tests and spot-check MCP/schema context consumers.
- [x] QA exploration completed (or N/A for non-UI tasks)
Review Feedback
- [ ] Review cleared