tasks/workstreams/inspect-profiler/initiatives/m1-analyst-workflow-specification.md

M1 Analyst Workflow Specification — Inspect Profiler

Task: M1_FT_ANALYTICS_ANALYST_PILOT-INSPECT_PROFILER-01 Status: Active Owner: Sr Engineer Architect Last updated: 2026-03-07

Scope

Enable internal analysts to execute a repeatable weekly profiling workflow against the 5T Analytics warehouse without bespoke engineering intervention. The scope is limited to the inspect-profiler workstream's contribution to the M1 analyst pilot; connectivity, IDE surfaces, and MCP tooling are owned by their respective workstreams.

In scope

Out of scope (tracked elsewhere)

Expected end-state

An internal analyst with access to the 5T Analytics BigQuery warehouse can:

  1. Run dft inspect table <schema.table> --dialect bigquery --connection <conn> from a terminal or IDE task.
  2. See a complete profile: row count, column stats, semantic types, quality classifications, null rates, distributions.
  3. Navigate from the model overview dashboard to individual column drill-in views and back.
  4. Persist results to target/inspect.json and re-render without re-querying.
  5. Eject and customize profile templates when default views are insufficient.
  6. Repeat this workflow weekly with no engineering assistance beyond initial environment setup.

Analyst workflow — step by step

Prerequisites (one-time, engineering-assisted)

  1. 5T Analytics BigQuery connectivity is configured (service account, credentials, project/dataset).
  2. dft CLI is installed in the analyst's environment (local or hosted).
  3. Analyst has read access to target BigQuery datasets.

Weekly workflow (self-service)

Step 1: Profile a table
  $ dft inspect table analytics.orders --dialect bigquery --connection "$BQ_CONN"

Step 2: Review terminal summary
  → Row count, column count, null rates, semantic types, quality flags

Step 3: Render profile dashboard
  $ dft inspect table analytics.orders --format html --dialect bigquery --connection "$BQ_CONN"
  → Opens model overview dashboard in browser

Step 4: Drill into columns of interest
  → Click column name → numeric/date/categorical/string column detail view
  → Review distributions, outliers, top values, data quality flags

Step 5: Check data quality view
  → Navigate to quality dashboard for null/empty/issue summary

Step 6: Drill back to model overview
  → Back links return to table-level context

Step 7: Persist and share
  → Profile saved to target/inspect.json
  → Re-render later: dft inspect table analytics.orders --format html (uses cached profile)

Optional: customize templates

$ dft inspect eject model quality numeric_column
→ Copies templates to faces/inspect/ for local editing
→ Manifest tracks changes for upstream compatibility

Current capability status

Capability Status Evidence
BigQuery profiling (stats, samples, semantic detection) Done test_inspect_adapters.py — BigQuery approximate profiling, TABLESAMPLE
Profiler response contract v1.0 Done CONTRACT.md, test_inspect_contract.py lock tests
27 semantic type detectors Done semantic_detector.py, confidence scoring
Quality classification (4 axes + 13 flags) Done quality_detector.py, test_inspect_adapters.py
Drill-in/out navigation links Done PR #485, test_profiler_nav_links.py
Template eject + manifest validation Done PR #474 (#302), test_inspect_cli.py
Spark bar charts for profiler cards Done Issue #282 (completed)
Column comments fetch/display Done Issue #386 (completed)
Storage and re-render from JSON Done storage.py, renderer.py
HTML output format Done --format html in CLI
Extension contract version guard Done PR #484
Joint extension integration validation In progress M1-PROFILER-001 remaining items

Dependencies

Dependency Owner Workstream Status Risk
BigQuery connectivity (credentials, project config) Head of Engineering integrations-platform Not started Blocking — analysts cannot profile without warehouse access
IDE extension profiler panel UI/Design Frontend Dev ide-extension Not started Non-blocking for CLI workflow; blocking for IDE workflow
Extension integration validation Sr Engineer Architect inspect-profiler In progress Low — contract is locked, remaining is joint test pass
Pilot environment (hosted dft runtime) Head of Engineering infra-tooling Not started Blocking if analysts don't have local install path

Risks

Risk Severity Mitigation Owner
BigQuery connectivity not ready by pilot start High CLI workflow can be validated against local DuckDB replica of 5T sample data as interim Sr Engineer Architect
BigQuery cost from repeated full-table profiling Medium Approximate profiling mode is implemented (APPROX + TABLESAMPLE); document cost-aware defaults for analysts Sr Engineer Architect
Semantic type detection accuracy on 5T data Medium Tracked for M2 depth expansion (M2-INSPECT-001); M1 accepts current 27-detector coverage with known gaps Sr Engineer Architect
Template customization complexity for non-engineers Low Eject flow + manifest validation reduces risk; runbook covers in task 03 Sr Engineer Architect

Follow-on tasks (not in this task's scope)