Dataface Tasks

Wire Dataface to internal analytics repo and BigQuery source

IDMCP_ANALYST_AGENT-WIRE_DATAFACE_TO_INTERNAL_ANALYTICS_REPO_AND_BIGQUERY_SOURCE
Statuscancelled
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownerdata-ai-engineer-architect

Problem

CANCELLED — merged into "Create analytics repo Dataface branch and bootstrap workflow" (dashboard-factory workstream). The BQ connection and the repo bootstrap are one deliverable: "I can dft serve in the analytics repo and get dashboards against BigQuery." All context from this task has been incorporated into the analytics bootstrap task.

Context (preserved for reference)

  • This connection is for analyst dashboarding — giving dft serve a working BQ data source so analysts can build and preview dashboards against real warehouse data in the analytics repo.
  • This is not an eval infrastructure dependency. The text-to-SQL eval runner is designed to work offline (deterministic scoring against gold SQL, no warehouse execution). The eval workstream (benchmark prep, eval runner, eval persistence) can proceed independently of this task.
  • The analytics repo lives at /Users/dave.fowler/Fivetran/analytics with the dbt project at dbt_ft_prod/.
  • cto-research already uses this repo as a metadata source for LookML and dbt connectors.
  • Dataface source configuration uses dataface.yml or environment-based connection strings.

This is purely for analyst dashboarding, not evals

The eval workstream has its own separate data path (DuckDB over local JSONL files). The eval runner and leaderboard dashboards have zero dependency on BigQuery. This task is only about giving analysts a working BQ data source to build dashboards against real warehouse data.

The eval leaderboard has its own dft project (apps/evals/) with its own DuckDB source. This task's dft project lives in the analytics repo. They are completely separate connection configs.

Analytics repo structure (from inspection)

The analytics repo at /Users/dave.fowler/Fivetran/analytics has: - dbt project at dbt_ft_prod/ (project name: prj_production, profile: fivetran) - bi_core/ — 45 gold-layer models (accounts, opportunities, connections, goals, etc.) - staging/ — 143 models across 18 source packages (salesforce, zendesk, segment, github, jira, etc.) - intermediate/ — minimal (just run_cache) - Additional model dirs: marketing, product, engineering, feature_adoption, support, etc.

Scope decisions (resolved)

Initial BQ dataset scope: Start with the models that bi_core/ queries — these are the gold-layer analytical models that analysts actually use. The staging models are useful context for the agent (column types, values) but the dashboards will primarily query bi_core. Don't try to allowlist individual tables — let the catalog/inspector discover what's available and start building dashboards against bi_core models. Expand scope as analysts hit needs.

Connection config location: The dataface.yml goes in the analytics repo alongside dbt_project.yml at dbt_ft_prod/dataface.yml (or at the analytics repo root if Dataface discovers it there). This is the analytics-repo bootstrap task's territory, but this task needs to produce a working BQ connection that the bootstrap task wires up. The BQ connection itself uses gcloud application-default credentials or a service account key path — either way, the sensitive credential stays in environment/local config, not checked into the repo. The dataface.yml just declares the source type and project ID.

Possible Solutions

Plan

Implementation Progress

QA Exploration

  • [x] QA exploration completed (or N/A for non-UI tasks)

N/A for browser QA. Validation is CLI/connection-level: dft serve against the analytics repo with BQ source should list tables and execute a simple query.

Review Feedback

  • [ ] Review cleared