M2 Internal Adoption + Design Partners

Due Apr 30, 2026

Planned25 / 120 (21%)

Exit criteria (0/4)

☐ Multiple internal teams using product weekly.
☐ 3-5 influencer companies active as design partners.
☐ Weekly feedback loop closes top blockers quickly.
☐ Pricing/packaging tested with real buyer conversations.

KPI targets (0/3)

☐ Activation trend improving week-over-week.
☐ Internal + design partner WAU stable or growing.
☐ Reliability metrics within pre-launch thresholds.

Tasks by Workstream

dft core (Sr Engineer Architect)

the YAML contract, compiler/normalizer, execution adapters, and release/versioning is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☐Implement YAML versioning and migrations — Add built-in schema version handling and migration workflow for dashboard YAML changes.
☐Adoption hardening for internal teams — Harden YAML contract and normalizer for repeated use across multiple internal teams and first design partners.
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for execution/runtime adapters with explicit decision logs.
☐feat: chart decisions Phase 5 — YAML annotation overrides — Support YAML annotation overrides for chart decisions so analysts can steer defaults without patching generated specs.
☑Highlight charts using branch dev schema versus prod fallback CompletedPR #746PR at 2026-03-23T21:46:57-07:00 — Add chart-level change highlighting in dashboard render paths by detecting whether each query resolved to branch dev sc…
☐Quality standards and guardrails — Define and enforce quality standards for versioning and migrations to keep output consistent as contributors expand.
☐refactor: Consolidate config system - single source of truth in YAML — Consolidate config loading into one YAML-driven source of truth to eliminate conflicting runtime settings paths.
☐Return normalized Dataface format for JSON export (not Vega-Lite specs) — Make JSON export return normalized Dataface spec objects instead of raw Vega-Lite so tooling can round-trip safely.
☐Add settings: attribute pattern for all chart types — Standardize a `settings` attribute pattern across chart types so YAML is consistent and easier to generate.
☐On-demand result-set profiling in chart decisions pipeline — Expand the ColumnProfile in decisions.py _profile_columns to compute richer statistical characteristics on-demand from…
☐Support bar-line combo charts (mixed marks per Y series) — Finance and ops dashboards often need one categorical or temporal x with multiple quantitative columns where some serie…

cloud suite (UI Design and Frontend Dev)

hosted user experience for onboarding, sharing, collaboration, and account/project flows is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☐Suite git branch and sync workflow for pilot teams — Enable pilot teams to branch, sync, and reconcile dashboard changes with git from Suite without manual backend interven…
☐Adoption hardening for internal teams — Harden workspace and onboarding UX for repeated use across multiple internal teams and first design partners.
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for sharing and collaboration surface with explicit decision logs.
☐GitHub integration auth for repo sync — Add GitHub OAuth or GitHub App based integration authorization for Cloud Suite repo sync flows on top of the production…
☐M1 Visual ASQL mode in cloud chart editor — Implement SQL/Visual query mode in cloud chart editor, integrating Visual ASQL alongside existing CodeMirror SQL mode.…
☐Quality standards and guardrails — Define and enforce quality standards for account/project lifecycle flows to keep output consistent as contributors expa…
☐Run internal analyst adoption loop — Set up weekly analyst usage review, friction capture, and release response cycle.
☐Suite Git Integration (dbt Cloud Model) — Define and deliver Suite Git integration using a dbt Cloud-style collaboration model for analytics workflows.
☐Suite Okta login via OIDC — Add Okta OIDC login to Cloud Suite as a secondary provider for internal Fivetran access, layered on top of the producti…
☐Suite role sync and permission enforcement follow-ups — Track post-login authorization, role mapping, and audit/drift follow-up work after M1 Google login.

inspect profiler (Sr Engineer Architect)

warehouse profiling, semantic inference, and analyst-facing data context surfaces is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☐Adoption hardening for internal teams — Harden profiling pipeline for repeated use across multiple internal teams and first design partners.
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for semantic inference and context quality with explicit decision logs.
☐Increase profiler semantic coverage — Improve semantic typing and profile output quality used by analysts and agent workflows.
☑Inspector template customization with eject command Completed — Provide an inspector template eject workflow so teams can customize profile UI safely while retaining upgrade paths.
☐Quality standards and guardrails — Define and enforce quality standards for analyst-facing inspector experience to keep output consistent as contributors…
☐Float confidence scores for statistical column characteristics — Replace binary boolean flags in ColumnInspection with float confidence scores 0.0-1.0 for statistical properties: is_se…
☐Surface join multiplicity in AI schema context and clarify FK cardinality contract — Relationship edges baked into inspect.json already carry deterministic join cardinality: join_profile.multiplicity clas…

mcp analyst agent (Data AI Engineer Architect)

AI agent tool interfaces, execution workflows, and eval-driven behavior tuning is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☑Add 'describe' or 'text' render output format for AI agents Completed — Add a `describe`/`text` render mode so AI agents can request compact textual dashboard outputs instead of visual payloa…
☐Add BIRD mini-dev adapter and runner support External Text-to-SQL Benchmarks and SOTA Calibration Waiting on plan-external-text-to-sql-benchmark-adoption-order-and-constraints, define-normalized-external-benchmark-case-and-result-contracts-for-apps-evals — Integrate BIRD mini-dev into apps/evals with a loader, benchmark-specific runner settings, and reproducible local basel…
☐Add external benchmark provenance dashboards and cross-benchmark slices External Text-to-SQL Benchmarks and SOTA Calibration Waiting on define-normalized-external-benchmark-case-and-result-contracts-for-apps-evals, add-bird-mini-dev-adapter-and-runner-support, add-spider-2-lite-adapter-and-runner-support — Extend the eval leaderboard so external benchmark runs are comparable by benchmark, split, dialect, scorer, and environ…
☐Add Spider 2.0 Lite adapter and runner support External Text-to-SQL Benchmarks and SOTA Calibration Waiting on plan-external-text-to-sql-benchmark-adoption-order-and-constraints, define-normalized-external-benchmark-case-and-result-contracts-for-apps-evals — Integrate Spider 2.0-Lite into apps/evals with benchmark-aware loading, environment assumptions, and reproducible basel…
☐Adoption hardening for internal teams — Harden MCP tool execution model for repeated use across multiple internal teams and first design partners.
☐Build bounded non-one-shot text-to-SQL stack for local evals Benchmark-Driven Text-to-SQL and Discovery Evals — Build an experimental local-only text-to-SQL backend that wraps the existing shared generator in a bounded plan - gener…
☑Build text-to-SQL eval runner and deterministic scorer Completed — Build a Dataface text-to-SQL eval harness that runs agent/model prompts against the cleaned benchmark and scores output…
☑Chat-First Home Page - Conversational AI Interface for Dataface Cloud AI Agent Surfaces Completed — Replace the current org home page (dashboard grid) with a chat-first interface. The home screen shows existing dashboar…
☑Create cleaned dbt SQL benchmark artifact Completed — Create a reproducible benchmark-prep step that imports the raw dbt dataset from cto-research, filters out AISQL rows, r…
☐Define normalized external benchmark case and result contracts for apps/evals External Text-to-SQL Benchmarks and SOTA Calibration Waiting on plan-external-text-to-sql-benchmark-adoption-order-and-constraints — Define the internal contracts that map external benchmark tasks, metadata, and outputs into Dataface eval types without…
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for agent prompt/workflow behavior with explicit decision logs.
☑Embeddable Dashboards in Chat - Inline Preview, Modal Expand, and Save to Repo AI Agent Surfaces Completed — Dashboards generated during chat conversations can be embedded inline as interactive previews. Users click to expand in…
☑Extract shared text-to-SQL generation function Benchmark-Driven Text-to-SQL and Discovery Evals Completed — Extract a shared generate_sql(question, context_provider, model) function, wire render_dashboard and cloud AIService to…
☑MCP and skills auto-install across all AI clients Completed — Expand dft mcp init to cover VS Code, Claude Code, and GitHub Copilot Coding Agent. Register MCP server programmaticall…
☐Plan external text-to-SQL benchmark adoption order and constraints External Text-to-SQL Benchmarks and SOTA Calibration — Decide which public benchmarks to adopt first, what environments and licenses they require, and what the phase-1 integr…
☐Quality standards and guardrails — Define and enforce quality standards for eval and guardrail framework to keep output consistent as contributors expand.
☐Run agent eval loop with internal analysts — Establish repeatable agent-level eval workflow that tests the full loop (prompt → tool use → SQL generation → dashboard…
☑Set up eval leaderboard dft project and dashboards Benchmark-Driven Text-to-SQL and Discovery Evals Completed — Create a dft project inside the eval output directory with dashboard faces that visualize eval results as a leaderboard…
☐Task M2 schema-aware query planning for cloud chat questions AI Agent Surfaces — The home-page/org chat can answer some data questions, but it sometimes generates SQL against columns that do not exist…
☑Terminal Agent TUI - dft agent Completed — Build a Claude Code-like terminal AI agent as a dft subcommand. The agent comes pre-loaded with Dataface MCP tools and…
☑Add catalog discovery evals derived from SQL benchmark Completed — Adapt the dbt SQL benchmark into search/catalog discovery eval cases by extracting expected tables from gold SQL and ge…
☐Add LiveSQLBench adapter and release-tracking workflow External Text-to-SQL Benchmarks and SOTA Calibration Waiting on plan-external-text-to-sql-benchmark-adoption-order-and-constraints, define-normalized-external-benchmark-case-and-result-contracts-for-apps-evals — Add a second-wave integration for LiveSQLBench with explicit handling for release versions, hidden-vs-open splits, and…
☑Add persistent analyst memories and learned context AI Quality Experimentation and Context Optimization Completed — Design and implement a memories file that accumulates knowledge from analyst queries — table quirks, column semantics,…
☑Chat Conversation Persistence and History AI Agent Surfaces Completed — Add ChatSession and ChatMessage Django models so chat conversations survive page refreshes. Show recent conversations i…
☑Curate schema and table scope for eval benchmark AI Quality Experimentation and Context Optimization Completed — Decide which schemas, tables, and data layers (raw, silver/staging, gold/marts) to include in the eval scope and catalo…
☑Dashboard linking v1 (dashboard-root paths, render-time rewrite, docs) Dashboard linking Completed — Implement cross-board linking for YAML dashboard markdown (dashboard-root paths, ../ relative, suffix strip, render-tim…
☐Experiment: Catalog tool access with vs without tool AI Quality Experimentation and Context Optimization — Compare runs with and without catalog/schema tool access to measure whether the tool itself materially improves SQL gen…
☑Experiment: Context ablation L0 vs L1 vs L3 vs L5 AI Quality Experimentation and Context Optimization CompletedPR #801PR at 2026-03-25T09:17:39-07:00 — Measure how schema context layers affect SQL quality on the canary set, starting with names-only versus richer descript…
☐Experiment: Layer scope all vs gold-only vs gold+silver AI Quality Experimentation and Context Optimization — Test whether exposing all tables, gold-only tables, or gold-plus-silver scope produces better SQL quality and lower noi…
☐Experiment: Model comparison GPT-4o vs Claude Sonnet AI Quality Experimentation and Context Optimization — Compare GPT-4o and Claude Sonnet on the same canary set, prompt, and context configuration to isolate model effects on…
☐Experiment: Model comparison GPT-4o vs GPT-5 AI Quality Experimentation and Context Optimization — Compare GPT-4o and GPT-5 on the same canary set, prompt, and context configuration to quantify quality and cost differe…
☐Experiment: Model comparison GPT-5 vs Claude Sonnet AI Quality Experimentation and Context Optimization — Compare GPT-5 and Claude Sonnet on the same canary set, prompt, and context configuration to isolate model effects on S…
☐Experiment: Schema tool strategy profiled vs filtered vs INFORMATION_SCHEMA vs none AI Quality Experimentation and Context Optimization — Compare schema acquisition strategies to determine whether profiled catalog fields, filtered fields, live INFORMATION_S…
✕Persist eval outputs for Dataface analysis and boards Cancelled — Define the canonical eval artifact schema for run metadata, per-case results, retrieval results, and summaries. Add loa…
☑Persist eval runs by default and add a quick eval dashboard serve command Benchmark-Driven Text-to-SQL and Discovery Evals Completed — Make eval runs durable by default instead of transient output, and add a one-command local entrypoint for browsing the…
☐Polish debug tool activity styling in cloud chat AI Agent Surfaces — Keep tool activity visible in debug mode, but replace the current raw/emojified presentation with a cleaner system stat…
☑Quickstart dashboard pack — Salesforce dbt project pilot Quickstart dashboards CompletedPR #718PR at 2026-03-22T21:55:00-07:00 — Pilot the quickstart dashboard process on the Salesforce quickstart dbt repo: checkout, dft init, run the product-resea…
☑Quickstart dashboard pack — Zendesk dbt project pilot Quickstart dashboards CompletedPR #717PR at 2026-03-22T21:52:54-07:00 — Second pilot for the quickstart dashboard process on the Zendesk quickstart dbt repo: same workflow as Salesforce pilot…
☑Quickstart dashboards — program setup (workspace, skill, pilot process) Quickstart dashboards Completed — Establish a repeatable program for Dataface dashboard packs on Fivetran quickstart open-source dbt projects: workspace…
☑Run context and model ablation experiments AI Quality Experimentation and Context Optimization Completed — Define and execute the initial experiment matrix using the eval system. Compare models (GPT-4o, GPT-5, Claude Sonnet, e…

ft dash packs (Data Analysis Evangelist and AI Training)

connector-specific dashboard packs and KPI narratives for Fivetran sources is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☐Adoption hardening for internal teams — Harden connector pack coverage for repeated use across multiple internal teams and first design partners.
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for dashboard narrative quality with explicit decision logs.
☐Quality standards and guardrails — Define and enforce quality standards for pack publishing workflow to keep output consistent as contributors expand.

ide extension (Head of Engineering)

analyst authoring workflow in VS Code/Cursor with preview, diagnostics, and assist is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☐Adoption hardening for internal teams — Harden editor + preview workflow for repeated use across multiple internal teams and first design partners.
☐Copilot + MCP dashboard and query workflow in extension — Enable a reliable Copilot-driven flow in the extension where MCP tools can generate and iterate on dashboards and SQL q…
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for IDE diagnostics and guidance with explicit decision logs.
☐Quality standards and guardrails — Define and enforce quality standards for inspector/agent integration in IDE to keep output consistent as contributors e…
☐Unify IDE preview and inspector on one extension-managed dft serve — Design and implement a single extension-managed dft serve runtime for both preview and inspector so the IDE stops mixin…

chart library (rj)

The chart library has distinct style packages, evaluation ownership, reusable design assertions, semantic-type-aware behaviors, exposed and documented chart properties, and a good table chart baseline. The system is coherent enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests.

☐Dual style packages — Establish at least two distinct chart style packages that pair structure and theme: one minimal analytic package inspir…
☐Lock visual language v1 with RJ — Finalize differentiated chart styling system, typography, and interaction defaults with RJ Andrews.
☐Adoption hardening for internal teams — Harden visual language system for repeated use across multiple internal teams and first design partners.
☐Chart evaluation ownership and improvement — Take ownership of the existing chart-evaluation tooling for the chart library: use it on the chart corpus, learn its st…
☐Define context-aware style system — Define the downstream context-aware style system for Dataface charts: when style decisions should vary by chart family,…
☐Design assertions foundation — Produce an initial chart-library design-assertions corpus from chart exploration, with assertions structured to support…
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for chart default behavior with explicit decision logs.
☐M2 conditional formatting and row-level styling engine M2 table formatting and linking architecture — Implement conditional formatting primitives and row-level style application with deterministic rule ordering and confli…
☐M2 in-cell visual encodings and sparkline consolidation M2 table formatting and linking architecture — Standardize in-cell bar and sparkline configuration under the table formatting model, including compatibility with exis…
☐M2 linking actions for headers, rows, and cells M2 table formatting and linking architecture — Define and implement link/action behavior for table headers, rows, and cells including parameter templating and safe fa…
☐M2 table config schema and precedence model M2 table formatting and linking architecture — Design the JSON schema for table formatting and linking with clear precedence across table, column, row, and cell scope…
☐M2 table formatting survey and option inventory M2 table formatting and linking architecture — Research BI products and table libraries, then produce a normalized option inventory for column, row, and cell formatti…
☐Quality standards and guardrails — Define and enforce quality standards for interaction/accessibility polish to keep output consistent as contributors exp…
☐Terminology standards and synonym handling — Define strict product terminology for chart and visualization concepts while preserving backward-compatible interpretat…
☐Consolidate semantic chart defaults and rendering control-surface follow-ons M2 semantic type behaviors — Bundle the recent chart-rendering follow-on backlog into one M2 task covering semantic axis behavior, temporal-axis lab…
☑Implement BI and editorial numeric notation families M2 semantic type behaviors CompletedPR #762PR at 2026-03-24T02:41:22-07:00 — Implement Dataface support for BI and editorial numeric notation families across charts and related surfaces. Define th…
☐M2 docs, examples, and QA matrix for table formatting M2 table formatting and linking architecture — Publish docs and reference examples, then validate behavior with a QA matrix covering common and edge-case formatting a…
☐M2 research perceptual graphic-emphasis analysis beyond contrast — Research whether Dataface should develop a chart-analysis tool that goes beyond standard accessibility contrast checks…
☐Vega-Lite format parity for tables and Python formatters M2 table formatting and linking architecture — Implement or reuse Python number/date formatting aligned with Vega-Lite format and formatType so style.columns value op…

context catalog nimble (Data AI Engineer Architect)

context schema/catalog contracts and Nimble enrichment flows across product surfaces is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☐Adoption hardening for internal teams — Harden context schema model for repeated use across multiple internal teams and first design partners.
☐Build question-aware schema search and isolation CLI over inspect.json and dbt metadata Question-aware schema retrieval and narrowing — Build a local file-and-CLI retrieval layer that composes inspect.json, dbt schema metadata, and lightweight docs into a…
☐Compare text-to-SQL evals with question-aware retrieval vs full-context prompting Question-aware schema retrieval and narrowing Waiting on build-question-aware-schema-search-and-isolation-cli-over-inspect-json-and-dbt-metadata, wire-question-scoped-context-bundles-into-text-to-sql-eval-backends — After the retrieval CLI and bundle integration land, run paired local evals comparing question-aware retrieval-and-isol…
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for context enrichment rules with explicit decision logs.
☐feat: chart decisions Phase 4 — SQLGlot column lineage Layer 6 Relationship Mapping — Implement SQLGlot column lineage integration to enrich chart decisions with column-level dependency context.
☐Quality standards and guardrails — Define and enforce quality standards for cross-surface context contract to keep output consistent as contributors expan…
☐Stabilize context catalog schema v1 — Finalize context schema contracts and integration points across inspect, generation, and agent flows.
☐Static semantic type propagation through SQL queries via SQLGlot — Use SQLGlot Expression.meta to propagate profiler-detected semantic types like CURRENCY, EMAIL, CREATED_AT through SQL…
☐Wire question-scoped context bundles into text-to-SQL eval backends Question-aware schema retrieval and narrowing — Teach the shared SQL generation and eval backend layer to consume question-scoped context bundles from the retrieval CL…
☐Iterate on question-aware retrieval with interface and result experiments Question-aware schema retrieval and narrowing Waiting on compare-text-to-sql-evals-with-question-aware-retrieval-vs-full-context-prompting — After the initial A/B eval comparing retrieval versus full-context prompting, run a small set of follow-up experiments…

dashboard factory (Data Analysis Evangelist and AI Training)

repeatable process for producing, reviewing, and publishing quickstarts/examples is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☐Adoption hardening for internal teams — Harden template production pipeline for repeated use across multiple internal teams and first design partners.
☐Artifact digest and Slack sharing workflow — Design a durable artifact review workflow that publishes shareable PNG/SVG dashboard snapshots with brief commentary to…
☐Define dashboard quality rubric v1 — Create rubric/checklist for quickstarts/examples used for internal and design-partner review.
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for quality rubric + review process with explicit decision logs.
☐Quality standards and guardrails — Define and enforce quality standards for publication throughput operations to keep output consistent as contributors ex…
☐Define dashboard reference boundary and canon strategy — Decide which third-party dashboard artifacts stay external, which lessons should be distilled into Dataface guidance, a…
☐Looker-to-Dataface migration skill via Looker API or CLI — Deliver a repo skill and/or CLI workflow that uses the Looker API or Looker CLI to export dashboards and map them to Da…
☐Study dashboard composition references and extract reusable lessons — Define a repeatable way to study admired dashboard compositions larger than single charts, collect reference artifacts,…

infra tooling (Sr Engineer Architect)

developer tooling, local workflow reliability, and deployment execution safety is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☐M2 design-partner closure and readiness decision — Own the M2 closure artifact: verify internal-adoption and design-partner readiness, maintain the operating checklist, t…
☑Master Plans CLI next-stage guidance command CompletedPR #781PR at 2026-03-24T20:50:47-07:00 — Add an advisory `plans task check` command that inspects a task's narrative sections, reports which are incomplete, ide…

integrations platform (Head of Engineering)

deployment, billing, 5T connectivity, and operational reliability/launch integration is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.

☐Define pricing and packaging model — Establish pricing tiers, packaging constraints, and internal approval for launch packaging.
☐Finalize product name and procure domain — Decide final product name, verify legal/domain availability, and secure primary domain + redirects.
☐Adoption hardening for internal teams — Harden platform deployment/integration path for repeated use across multiple internal teams and first design partners.
☐Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for billing/connectivity operations with explicit decision logs.
☑M2 RunCache integration contract and adoption plan Completed — Define and implement an M2 integration contract for Dataface with Run Cache, including analytics repo compatibility, ca…
☐Quality standards and guardrails — Define and enforce quality standards for reliability + launch operations to keep output consistent as contributors expa…