Dataface Tasks

Populate Faketran application database models for fake companies

IDFT_DASH_PACKS-POPULATE_FAKETRAN_APPLICATION_DATABASE_MODELS_FOR_FAKE_COMPANIES
Statuscompleted
Priorityp1
Milestonem1-ft-analytics-analyst-pilot
Ownerdata-analysis-evangelist-ai-training
Completed bydave
Completed2026-03-16

Problem

Audit vendored Faketran fake-company sources for internal application database coverage, then populate and validate the missing product_db tables so generated company databases include realistic app-side data beyond connector syncs.

Context

  • Faketran already vendors raw connector-side source models under libs/faketran/faketran/sources/ and both fake companies (dundersign, pied_piper) generate cross-system SaaS data.
  • There is also an internal product_db source intended to represent the fake company's own application database, analogous to the operational PostgreSQL schema behind the app.
  • product_db defines these app tables today: user, team, document, document_recipient, signature, session, template, and audit_log.
  • The gap is not schema definition; it is generation coverage. Before this task, the generators only populated user, document, document_recipient, and session, while lifecycle state lived separately in ctx.user_plans.
  • That meant the generated warehouse/database missed realistic app-side entities such as workspaces, templates, signatures, and audit history, and even persisted product_db.user.plan values drifted from the actual simulated lifecycle.

Possible Solutions

  1. Patch each fake-company generator inline to manually create the missing product_db rows. Trade-off: fastest for one company, but duplicates app-database logic across Dundersign and Pied Piper and will drift again.
  2. Expand shared product_db source actions so generators call a single lifecycle-aware API for sessions, plan changes, documents, signatures, templates, teams, and audit logs. Recommended because both companies already share product_db, and the missing behavior is source-level business logic, not company-specific connector logic.
  3. Leave generation as-is and only document the intended PostgreSQL schema. Trade-off: answers the question on paper but does not fix the generated datasets or example databases.

Plan

  • Extend libs/faketran/faketran/sources/product_db/__init__.py with shared helpers that sync persisted user lifecycle state and create app-side workflow records.
  • Update Dundersign and Pied Piper generators to use those helpers for plan transitions, sessions, and document activity generation.
  • Add focused Faketran tests that prove the missing product tables are populated and written into the generated database.

Implementation Progress

  • Confirmed product_db already models the fake application's own database schema: user, team, document, document_recipient, signature, session, template, audit_log.
  • Confirmed only a subset of those tables were populated by the generators, and product_db.user.plan was out of sync with the simulated lifecycle state in ctx.user_plans.
  • Added shared ProductDB helpers to: sync persisted user plans, create/reuse teams, generate templates, emit signatures and audit logs for document workflows, and log user sessions.
  • Updated both fake-company generators to use shared ProductDB helpers instead of directly mutating lifecycle state or creating bare sessions/documents.
  • Fixed an incidental HubSpot package import bug exposed by the new tests: STORY_TABLES belongs to hubspot.schema, not hubspot.models.
  • Added focused Faketran tests covering both fake companies plus a database-write assertion for the generated application tables.
  • Validation: uv run pytest tests/faketran/test_application_models.py passed.

Review Feedback

  • No separate cbox review was run in this thread; validation relied on focused Faketran tests.

  • [x] Review cleared