Operationalize Fivetran GCP deploy path
Problem
Dataface now has a real deploy path to Fivetran's GCP infrastructure, and the pilot runtime has already been exercised on Cloud Run. The remaining risk is that the path is still only partially operationalized: runtime deploys rely on JSON-key auth by default, rollback/monitoring are not fully verified, and the final static-site / ops handoff steps have not been completed. Without closing those gaps, each future deploy still carries avoidable operator risk.
Context
- This task is about deployment mechanics and operational runbooks for the pilot runtime on Fivetran GCP.
- Warehouse/data connectivity is tracked separately in
task-m1-ft-analytics-connectivity.md. host-dataface-on-fivetran-gcp.md(infra-tooling) captures the broader hosting program; this task narrows integrations-platform scope to the repeatable deploy/rollback/ops path.- All four Cloud Run services (Playground, Cloud, A lIe, ASQL Playground) and three static sites (docs, tasks, ASQL docs) are already containerized and deployed to
internal-dataface-eng. - GitHub Actions workflows exist for both runtime (
deploy-cloud-run.yml, manual dispatch) and static sites (deploy-static-sites.yml, auto on push tomain). - Justfile recipes cover build, push, and deploy for each service individually or all at once.
- Auth currently uses SA JSON key; WIF support is wired in workflows but not yet activated in GCP.
- Runbook and deployment guide docs exist at
apps/docs/GCP_HOSTING_RUNBOOK.mdandapps/docs/CLOUD_RUN_DEPLOYMENT.md.
Possible Solutions
- Docs-only operationalization — Fill gaps in existing runbooks (rollback, ASQL service coverage, Cloud SQL vars), fix stale deployment guide references, keep justfile as-is. Operators memorize
gcloudrollback commands. - Docs + rollback recipe (Recommended) — Same doc fixes plus a
just rollback <service>recipe that wrapsgcloud run services update-trafficfor instant previous-revision rollback. Eliminates the most common failure mode (operator forgets revision name syntax) with zero additional infrastructure. - Full automation — Add health-check gates, automated canary rollback, monitoring dashboards. Overkill for a pilot with 4 services and <10 users; better deferred to post-pilot hardening.
Plan
- Add
just rollback <service>recipe to the justfile — lists the two most recent revisions, shifts 100% traffic to the previous one. - Update
GCP_HOSTING_RUNBOOK.md: add ASQL Playground service,ASQL_DOCS_BUCKET, Cloud SQL instance vars, rollback procedure (recipe + manual gcloud), clarify trigger models. - Fix
CLOUD_RUN_DEPLOYMENT.md: correct stale "push to main" trigger claim for Cloud Run (now manual-only), add static sites workflow reference, add rollback section pointing to runbook. - Update task file with filled Context/Solutions/Plan/Progress.
- Validate task file, commit, and create PR.
Implementation Progress
- Boundary clarified: this task owns deploy/runbook reliability, not warehouse credentials or application-level connectivity.
- Pilot runtime has been deployed successfully on
internal-dataface-eng. - Runtime workflows are now manual-only for Cloud Run services to avoid accidental deploys on every
mainpush. - GitHub Actions workflows have been prepared to support either SA JSON key auth or Workload Identity Federation.
- [x] Added
just rollback <service>recipe to justfile — lists two most recent revisions, shifts traffic to previous. - [x] Updated
GCP_HOSTING_RUNBOOK.md: added ASQL Playground service,ASQL_DOCS_BUCKET,ASQL_PLAYGROUND_SERVICE_NAME, Cloud SQL instance connection secrets, full rollback procedure, corrected trigger/auth notes. - [x] Fixed
CLOUD_RUN_DEPLOYMENT.md: corrected Cloud Run workflow trigger from "push to main" to manual-only, added static sites workflow reference, added rollback section. - Remaining external work (blocked on GCP project provisioning): live WIF activation, monitoring/alerting setup, internal domain configuration.
Review Feedback
- [ ] Review cleared