Host Dataface on Fivetran GCP
Problem
The deploy infrastructure for hosting Dataface on GCP Cloud Run is functionally in place,
and the first Fivetran-hosted runtime has already been deployed to internal-dataface-eng.
The remaining work is hardening and operational polish: static-site publication,
Workload Identity Federation migration, monitoring/alerts, internal-access/domain setup,
and landing the local deploy fixes into main.
Dataface needs a canonical Fivetran-hosted runtime for the internal analyst pilot (M1). Prior to this work, the only deploy path was Railway (Playground only). The pilot requires all three services (Playground, Cloud/Django, A lIe) running on Fivetran-controlled GCP infrastructure with proper secrets management, IAM, and rollback capability.
Context
- A first hosted runtime already exists on Fivetran GCP, so the remaining work is not greenfield deploy setup but operational hardening and repeatable ownership.
- This task now sits at the boundary between deployment reliability, secrets/config, monitoring, and the runbook needed to operate the hosted environment safely.
- Because the task is blocked, the plan should make the remaining blockers explicit rather than pretending deployment itself is still undefined.
Possible Solutions
- Railway — Already used for Playground. Quick but no Fivetran ownership, limited secrets/IAM integration, no multi-service support. Rejected for pilot.
- GKE (Kubernetes) — Full-featured but heavy operational overhead for a 3-service internal pilot. Rejected as overkill at this stage.
- Cloud Run + Artifact Registry — Serverless, auto-scaling, simple deploy model, native GCP IAM and Secret Manager integration. Chosen for pilot.
Plan
Cloud Run with Artifact Registry, deployed via GitHub Actions via manual dispatch for runtime services and automatic deploy for static sites.
Architecture:
- 3 Cloud Run services: dataface-playground (public), dataface-cloud (IAM-gated),
dataface-alie (IAM-gated)
- Static sites: Docs + Tasks on GCS buckets (separate workflow)
- Artifact Registry: Docker images tagged with git short SHA
- Secrets: GCP Secret Manager, injected at deploy time via --set-secrets
- Database: Cloud SQL PostgreSQL for Django (via DATABASE_URL env var)
Implementation Progress
- Confirm target runtime architecture and ownership boundaries.
- Implement deploy path, secrets wiring, and environment promotion flow.
- Publish runbooks for deploy, rollback, and incident triage.
- Provision Fivetran GCP project and wire GitHub secrets.
- First successful deploy of all three services to Fivetran GCP.
- Prepare GitHub Actions auth for Workload Identity Federation.
-
Add uptime/error-budget monitoring and alerting.
-
Internal pilot environment can be deployed from main with documented steps.
- Deployed and running on a Fivetran-owned GCP project.
- Runtime health and failure signals are visible in monitoring/alerts.
-
Rollback path is tested and documented.
-
Coordinate with cloud-suite and integrations-platform for auth/network prerequisites.
-
Remaining blocker: internal domains / access model and any live GCP changes that require fresh operator auth in
gcloud. -
Related task:
tasks/workstreams/integrations-platform/tasks/task-m1-fivetran-gcp-deploy-path.md
What's already built and merged to main:
Dockerfiles (all 3 services):
- Dockerfile.cloud — Django via gunicorn, runs migrate + collectstatic on startup
- Dockerfile.playground — FastAPI via uvicorn
- Dockerfile.alie — FastAPI via uvicorn
- All use ghcr.io/astral-sh/uv:python3.13-bookworm-slim base, non-root user
Justfile deploy recipes:
- just deploy-cloud-run-service <name> <dockerfile> — generic deploy (build, push to
Artifact Registry, gcloud run deploy)
- just deploy-playground / just deploy-cloud / just deploy-alie / just deploy-all-services
- just build-static-sites / just deploy-static-sites — MkDocs → GCS buckets
- just publish / just publish-setup — Artifact Registry package publishing
GitHub Actions workflows:
- .github/workflows/deploy-cloud-run.yml — triggers on push to main or manual dispatch;
matrix strategy deploys all 3 services; uses SA JSON key auth
- .github/workflows/deploy-static-sites.yml — triggers on push to main (docs/plans paths)
or manual dispatch; builds MkDocs sites, syncs to GCS
Startup scripts:
- scripts/cloud_run_start_cloud.sh — Django entrypoint: validates SECRET_KEY and
ALLOWED_HOSTS, runs migrate + collectstatic, starts gunicorn
Documentation:
- apps/docs/CLOUD_RUN_DEPLOYMENT.md — full deployment guide (one-time setup, manual deploy,
automated deploy, prerequisites, runtime entrypoints, hardening roadmap)
- apps/docs/GCP_HOSTING_RUNBOOK.md — operational runbook (topology, services, env vars,
GitHub secrets inventory, recommended Django env/secrets)
- docs/RAILWAY_DEPLOYMENT.md — legacy Railway docs (Playground only)
Environment config:
- .env.example — documents all GCP-related vars (GCP_PROJECT_ID, GCP_REGION,
ARTIFACT_REPO, service name overrides, Cloud SQL connection, bucket names)
What remains (unblocked items first):
- Publish static sites — push docs, tasks, and ASQL docs into the provisioned buckets and verify the hosting path.
- Land deploy fixes — commit the local Docker/workflow/runtime changes that made the hosted services healthy.
- Migrate to Workload Identity Federation — create the provider/binding in GCP and
switch GitHub secrets from
GCP_SA_KEY_JSONto WIF secrets. - Monitoring — uptime checks, error-budget alerts, Cloud Run metrics dashboards.
- Rollback testing — verify
gcloud run services update-trafficfor instant rollback. - Internal access/domains — finish the Fivetran-standard subdomain + access model once infra confirms the DNS/auth path.
Go-live checklist
- [x] Confirm target project:
internal-dataface-eng - [ ] Confirm deploy region (default in repo is
us-west1unless infra chooses otherwise) - [x] Enable required APIs:
- Cloud Run API
- Cloud Build API
- Artifact Registry API
- Secret Manager API
- Cloud SQL Admin API
- [x] Create deploy service account for CI/manual deploys
- [x] Grant deploy service account roles:
- Cloud Run Admin
- Service Account User
- Cloud Build Editor
- Artifact Registry Writer
- [ ] Decide auth model for deploy:
- Short-term: GitHub Actions JSON key (
GCP_SA_KEY_JSON) - Follow-up hardening: Workload Identity Federation
- [x] Create Artifact Registry repo (
dataface, unless renamed) - [x] Provision Cloud SQL Postgres for
dataface-cloud - [ ] Decide Cloud SQL connectivity model and connection string delivery
- [~] Create Secret Manager secrets:
- [x]
DJANGO_SECRET_KEY - [ ]
OPENAI_API_KEYif AI features are enabled - [x] database credential secret(s) or fully assembled
DATABASE_URL - [x] Ensure the runtime service account used by Cloud Run can read required secrets
- [x] Create GCS buckets:
- docs bucket
- tasks bucket
- ASQL docs bucket
- [x] Grant deploy identity bucket write access for static-site deploys
- [~] Configure GitHub repository secrets:
GCP_PROJECT_IDGCP_REGIONARTIFACT_REPOGCP_SA_KEY_JSONif using JSON-key authGCP_WORKLOAD_IDENTITY_PROVIDERif using WIFGCP_SERVICE_ACCOUNT_EMAILif using WIFDOCS_BUCKETTASKS_BUCKETASQL_DOCS_BUCKETCLOUD_RUN_PLAYGROUND_SERVICE_NAMEif overriding defaultCLOUD_RUN_CLOUD_SERVICE_NAMEif overriding defaultCLOUD_RUN_ALIE_SERVICE_NAMEif overriding defaultCLOUD_RUN_ASQL_PLAYGROUND_SERVICE_NAMEif overriding default- [x] Configure Cloud Run runtime env/secrets for Playground as needed
- [x] Configure Cloud Run runtime env/secrets for A lIe as needed
- [x] Configure Cloud Run runtime env/secrets for Django Cloud service:
DEBUG=FalseALLOWED_HOSTS=<cloud-run-host-or-custom-domain>DATABASE_URL=<postgres-connection-string>REPOS_ROOT=/tmp/dataface-reposSECRET_KEYfrom Secret ManagerOPENAI_API_KEYfrom Secret Manager if AI features are enabled- [x] Verify whether Cloud Run services should remain:
- Playground: public (
--allow-unauthenticated) - Cloud: IAM-gated by default
- A lIe: IAM-gated by default
- [x] Run first deploy:
just deploy-playgroundjust deploy-cloudjust deploy-aliejust deploy-asql-playground- or GitHub workflow dispatch for selected services
- [x] Smoke test deployed URLs and auth expectations
- [x] Verify Django migrations and static collection complete on startup
- [ ] Verify rollback path with Cloud Run revision traffic switch
- [ ] Add monitoring:
- uptime checks
- error alerting
- service dashboards
- [ ] Track WIF migration as post-first-deploy hardening
Key commits (chronological):
21504dd7— switch Cloud Run deploy to image-based flowe84112df— add publish script + just publish for Artifact Registryefe5b6fe— multi-agent rollup: Cloud Run deploy, cbox, docs, examples5a3e5fe0— fix round 2 review: container hardening, auth control, deploy guards999ea8e0— fix round 3 review: docs accuracy, stale docstrings0ee1ea09— multi-agent rollup: Cloud Run deploy, cbox, docs, examples (#483)
Review Feedback
- 2026-03-07: Investigation by agent. All deploy infrastructure confirmed present in
main. No GCP project has been provisioned yet — blocked on receiving a Fivetran project ID. The duplicate task atintegrations-platform/tasks/task-m1-fivetran-gcp-deploy-path.mdcovers the same ground and should be consolidated or cross-referenced.
Pending — blocked on GCP project provisioning.
- [ ] Review cleared
2026-03-22 Triage Decision
- Status set to
blocked(not obsolete): deploy/tooling/runbook groundwork is merged, but operational cutover is blocked on external infra ownership inputs (Fivetran GCP project provisioning and secrets wiring).