Dataface Tasks

Operationalize Fivetran GCP deploy path

IDM1-INTPLAT-002
Statuscompleted
Priorityp0
Milestonem1-ft-analytics-analyst-pilot
Ownerhead-of-engineering

Problem

Dataface now has a real deploy path to Fivetran's GCP infrastructure, and the pilot runtime has already been exercised on Cloud Run. The remaining risk is that the path is still only partially operationalized: runtime deploys rely on JSON-key auth by default, rollback/monitoring are not fully verified, and the final static-site / ops handoff steps have not been completed. Without closing those gaps, each future deploy still carries avoidable operator risk.

Context

  • This task is about deployment mechanics and operational runbooks for the pilot runtime on Fivetran GCP.
  • Warehouse/data connectivity is tracked separately in task-m1-ft-analytics-connectivity.md.
  • host-dataface-on-fivetran-gcp.md (infra-tooling) captures the broader hosting program; this task narrows integrations-platform scope to the repeatable deploy/rollback/ops path.
  • All four Cloud Run services (Playground, Cloud, A lIe, ASQL Playground) and three static sites (docs, tasks, ASQL docs) are already containerized and deployed to internal-dataface-eng.
  • GitHub Actions workflows exist for both runtime (deploy-cloud-run.yml, manual dispatch) and static sites (deploy-static-sites.yml, auto on push to main).
  • Justfile recipes cover build, push, and deploy for each service individually or all at once.
  • Auth currently uses SA JSON key; WIF support is wired in workflows but not yet activated in GCP.
  • Runbook and deployment guide docs exist at apps/docs/GCP_HOSTING_RUNBOOK.md and apps/docs/CLOUD_RUN_DEPLOYMENT.md.

Possible Solutions

  1. Docs-only operationalization — Fill gaps in existing runbooks (rollback, ASQL service coverage, Cloud SQL vars), fix stale deployment guide references, keep justfile as-is. Operators memorize gcloud rollback commands.
  2. Docs + rollback recipe (Recommended) — Same doc fixes plus a just rollback <service> recipe that wraps gcloud run services update-traffic for instant previous-revision rollback. Eliminates the most common failure mode (operator forgets revision name syntax) with zero additional infrastructure.
  3. Full automation — Add health-check gates, automated canary rollback, monitoring dashboards. Overkill for a pilot with 4 services and <10 users; better deferred to post-pilot hardening.

Plan

  1. Add just rollback <service> recipe to the justfile — lists the two most recent revisions, shifts 100% traffic to the previous one.
  2. Update GCP_HOSTING_RUNBOOK.md: add ASQL Playground service, ASQL_DOCS_BUCKET, Cloud SQL instance vars, rollback procedure (recipe + manual gcloud), clarify trigger models.
  3. Fix CLOUD_RUN_DEPLOYMENT.md: correct stale "push to main" trigger claim for Cloud Run (now manual-only), add static sites workflow reference, add rollback section pointing to runbook.
  4. Update task file with filled Context/Solutions/Plan/Progress.
  5. Validate task file, commit, and create PR.

Implementation Progress

  • Boundary clarified: this task owns deploy/runbook reliability, not warehouse credentials or application-level connectivity.
  • Pilot runtime has been deployed successfully on internal-dataface-eng.
  • Runtime workflows are now manual-only for Cloud Run services to avoid accidental deploys on every main push.
  • GitHub Actions workflows have been prepared to support either SA JSON key auth or Workload Identity Federation.
  • [x] Added just rollback <service> recipe to justfile — lists two most recent revisions, shifts traffic to previous.
  • [x] Updated GCP_HOSTING_RUNBOOK.md: added ASQL Playground service, ASQL_DOCS_BUCKET, ASQL_PLAYGROUND_SERVICE_NAME, Cloud SQL instance connection secrets, full rollback procedure, corrected trigger/auth notes.
  • [x] Fixed CLOUD_RUN_DEPLOYMENT.md: corrected Cloud Run workflow trigger from "push to main" to manual-only, added static sites workflow reference, added rollback section.
  • Remaining external work (blocked on GCP project provisioning): live WIF activation, monitoring/alerting setup, internal domain configuration.

Review Feedback

  • [ ] Review cleared