Dataface Tasks

Vendor faketran as a monorepo lib and replace mockusign/gruber datasets

IDFT_DASH_PACKS-VENDOR_FAKETRAN_EXAMPLES_AND_REPLACE_MOCKUSIGN_GRUBER_DATASETS
Statuscompleted
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownerdata-analysis-evangelist-ai-training

Problem

The repo still relies on the lightweight mockusign and gruber example datasets for fixtures, dbt examples, and local demo flows. Those examples are simpler than the faketran datasets and do not reflect the richer schemas and cross-table realism available in https://github.com/fivetran/faketran. That leaves demos, dashboard-pack examples, and seed-based onboarding flows anchored to lower-fidelity data than the team now wants to showcase.

We need to pull faketran into this repository as a monorepo-owned library/module, not leave it as a separate external repo dependency. That imported library should hold the canonical upstream-derived assets, while the runnable Dataface demo projects should live in this repo's examples/ tree as first-class repo-owned examples. On top of that, we need to swap the current examples to stronger datasets, starting with dundersign and optionally a second faketran example to replace the current mockusign/gruber pair. The change needs to preserve working local examples, keep the data-shape boundary intact, and update any docs, tests, seed commands, and references that assume the old dataset names.

Context

  • faketran is not currently present in this repo as a monorepo library, submodule, or checked-in example directory.
  • Current example assets live in:
  • examples/mockusign_dbt/
  • examples/gruber_dbt/
  • apps/cloud/fixtures/data/mockusign/
  • apps/cloud/fixtures/data/gruber/
  • Seeded demo/org references also appear in apps/cloud/apps/projects/management/commands/seed_dev_data.py.
  • Integration tests explicitly mention the current examples in tests/integration/test_examples.py.
  • There is already related work in tasks/workstreams/ft-dash-packs/tasks/issue-306-transform-mockusign-dbt-into-realistic-dbt-project-with-staging-.md, but that task upgrades mockusign_dbt in place rather than replacing the example source with vendored faketran datasets.
  • Target structure:
  • libs/faketran/ or equivalent monorepo lib path for imported canonical faketran assets.
  • examples/ for promoted runnable Dataface examples such as dundersign_dbt.
  • Avoid long-term duplication where the same example lives both inside the vendored lib and as a separate repo example.
  • Constraint: the repo should keep ownership of query semantics and model shape while using richer example data; the visualization layer should not take on data-meaning responsibilities.
  • Constraint: any direct edits to task files must preserve frontmatter and validate cleanly through the task CLI.

Possible Solutions

  • Recommended: Pull faketran into this monorepo as a repo-owned library/module with a clear internal location and provenance notes, then promote the chosen runnable examples into this repo's examples/ tree while replacing mockusign and gruber references with dundersign plus one other selected faketran dataset.
  • Pros: keeps upstream-derived assets separate from Dataface-owned example projects, preserves examples/ as the obvious place for demos, and makes schema/dbt/docs/test updates easier to coordinate in one change.
  • Cons: requires choosing a stable home for the imported library/module, defining how examples consume the canonical data, and doing a one-time migration sweep across example seeds, fixtures, tests, and docs.
  • Add faketran as a git submodule or external checkout and point the repo at it.
  • Pros: simpler upstream syncing with the source repo.
  • Cons: makes local setup and CI more fragile, keeps examples split across repos, and complicates stable demo/test inputs.
  • Keep mockusign/gruber but backfill them with richer schemas inspired by faketran.
  • Pros: smaller naming migration.
  • Cons: duplicates effort already embodied in faketran, keeps the weaker branding/examples, and still requires reconstructing better datasets manually.

Plan

  • Decide where faketran should live inside the monorepo and import it there as a repo-owned library/module with clear provenance and update guidance.
  • Inventory all references to mockusign and gruber across example projects, seed fixtures, seeded org/project metadata, tests, and docs.
  • Review faketran and select the target replacement datasets, with dundersign as the default primary example and a second dataset chosen based on schema quality and demo usefulness.
  • Promote the selected runnable examples into examples/ as first-class Dataface examples instead of treating the vendored lib directory as the public demo surface.
  • Wire the vendored faketran library/module into the repo’s example/fixture workflows so dataset consumers use the in-repo source of truth without unnecessary duplication.
  • Update dbt example projects, seed files, and any fixture-loading flows to use the new datasets and names.
  • Update seeded app/demo metadata and project references so local environments expose the new examples instead of mockusign/gruber.
  • Refresh docs and tests that reference the old examples; add or update checks that prove the new examples still build and render.
  • Document the vendoring/update policy so future faketran syncs are intentional rather than ad hoc.

Implementation Progress

  • Task created from user request to pull faketran into the monorepo and replace the current lower-fidelity example datasets.
  • Initial repo audit found mockusign and gruber in checked-in fixtures, dbt example directories, seed-dev-data setup, docs, and integration test skips.
  • Initial repo audit found no checked-in faketran library/submodule/example directory in this worktree.
  • Vendored the upstream generator framework into libs/faketran/, preserved upstream README provenance, and added a Dataface-specific faketran-export-examples command for curated seed export.
  • Promoted new repo-owned example projects under examples/dundersign_dbt/ and examples/pied_piper_dbt/ with dashboard YAML tailored to the curated faketran exports.
  • Exported curated CSV seeds and matching cloud fixtures for both dundersign and pied_piper (daily_metrics, users, documents, subscriptions, opportunities, tickets, workforce).
  • Updated seed_dev_data to seed dundersign/signing-analytics and piedpiper/platform-analytics, and removed the old gruber / mockusign example directories and fixtures.
  • Fixed an upstream syntax defect in libs/faketran/fake_companies/pied_piper/generate.py that blocked the vendored Pied Piper generator import.
  • Updated the example integration test harness so _dbt examples with local seeds/ resolve relative assets correctly.
  • Validated:
  • uv run --extra dev pytest tests/integration/test_examples.py -q
  • PYTHONPATH=/Users/dave.fowler/.codex/worktrees/24ad/dataface uv run --extra cloud python -m apps.cloud.manage migrate
  • PYTHONPATH=/Users/dave.fowler/.codex/worktrees/24ad/dataface uv run --extra cloud python -m apps.cloud.manage seed_dev_data --reset
  • uv run --extra dev ruff check libs/faketran/faketran/export_dataface_examples.py apps/cloud/apps/projects/management/commands/seed_dev_data.py tests/integration/test_examples.py

Review Feedback

  • Fixed the vendored Pied Piper generator syntax error before exporting curated seeds so the second promoted example could be generated/exported cleanly from the monorepo copy.
  • Local validation passed for example compilation/rendering, targeted lint, Django migrations, and seed_dev_data against the renamed example projects.

  • [x] Review cleared