We deleted every static GCP service account key in our org over six weeks. Here's the migration plan, the gotchas, and the policies we now enforce.
Six weeks ago we had 34 GCP service account JSON keys scattered across CI systems, developer laptops, and one regrettable Slack DM. Today we have zero. Every workload — CI, on-prem agents, third-party SaaS — authenticates via Workload Identity Federation. Here's the migration plan, the friction we hit, and the policies that now keep this from regressing.
Three near-misses in the last year:
creds.json and was pushed to a public ECR mirror. Caught in 23 minutes by GCP's automated detection — but still.Editor on the dev project.Static credentials don't expire and don't know who's using them. They're the worst of all worlds.
WIF lets external identities (GitHub OIDC tokens, AWS IAM roles, OIDC from Kubernetes, SAML, etc.) impersonate GCP service accounts without any long-lived credential.
The flow:
External Identity (GitHub OIDC token)
│
▼
Workload Identity Pool (validates issuer + claims)
│
▼
Workload Identity Provider (maps claims → attributes)
│
▼
Service Account Impersonation (short-lived token, ≤1h)
│
▼
GCP API call
The "credential" your workload sees is an identity token from its native trust system (GitHub, AWS, K8s). GCP validates that token and issues a short-lived access token in exchange.
# Find every service account key in every project
for proj in $(gcloud projects list --format="value(projectId)"); do
gcloud iam service-accounts list --project=$proj --format=json \
| jq -r '.[].email' \
| while read sa; do
gcloud iam service-accounts keys list --iam-account=$sa --project=$proj \
--filter="keyType=USER_MANAGED" --format=json
done
done > all-keys.json
We found 34 user-managed keys across 5 projects. For each, we mapped: who downloaded it, where it lives now, what it's used for.
Two keys turned out to be completely unused — leftover from migrations years ago. Delete them first; instant security win.
resource "google_iam_workload_identity_pool" "github" {
workload_identity_pool_id = "github-actions-pool"
display_name = "GitHub Actions"
description = "Identity pool for GitHub Actions OIDC"
}
resource "google_iam_workload_identity_pool_provider" "github" {
workload_identity_pool_id = google_iam_workload_identity_pool.github.workload_identity_pool_id
workload_identity_pool_provider_id = "github"
attribute_mapping = {
"google.subject" = "assertion.sub"
"attribute.repository" = "assertion.repository"
"attribute.repository_owner" = "assertion.repository_owner"
"attribute.workflow_ref" = "assertion.workflow_ref"
}
attribute_condition = "assertion.repository_owner == 'kirilurbonas'"
oidc {
issuer_uri = "https://token.actions.githubusercontent.com"
}
}
That attribute_condition is the most important line. Without it, any GitHub Actions workflow on the entire internet can authenticate to your pool. Restrict to your org.
For each service account that CI used:
resource "google_service_account_iam_binding" "ci_deployer" {
service_account_id = google_service_account.ci_deployer.name
role = "roles/iam.workloadIdentityUser"
members = [
"principalSet://iam.googleapis.com/projects/${var.project_number}/locations/global/workloadIdentityPools/${google_iam_workload_identity_pool.github.workload_identity_pool_id}/attribute.repository/kirilurbonas/devopsness",
]
}
This binding says: any GitHub Actions workflow in kirilurbonas/devopsness can impersonate ci_deployer@. Not other repos in our org. Specific.
Old workflow:
- uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}
New workflow:
permissions:
id-token: write # required for GitHub OIDC
contents: read
steps:
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: projects/123456789012/locations/global/workloadIdentityPools/github-actions-pool/providers/github
service_account: ci_deployer@my-project.iam.gserviceaccount.com
No secrets. The OIDC token is generated automatically by GitHub for each job.
We had a Jenkins instance running outside any cloud. WIF requires an OIDC issuer, which Jenkins doesn't natively provide.
Solution: stand up a small OIDC provider next to Jenkins (we used spiffe/spire) that issues short-lived JWTs to Jenkins jobs. Configure GCP WIF to trust that issuer. Jenkins jobs now get GCP credentials via WIF without ever holding a static key.
The setup took a week. Worth it: this Jenkins instance was the holder of two of the most powerful service account keys we had.
# For each migrated service account:
for sa in ci_deployer@... jenkins_deployer@... ; do
gcloud iam service-accounts keys list --iam-account=$sa \
--filter="keyType=USER_MANAGED" --format='value(name)' \
| while read keyId; do
gcloud iam service-accounts keys delete $keyId \
--iam-account=$sa --quiet
done
done
We waited two weeks after this before deleting the now-disabled service accounts entirely. Nothing broke.
# Organization Policy
constraint: constraints/iam.disableServiceAccountKeyCreation
listPolicy:
allValues: DENY
This blocks creation of new user-managed service account keys org-wide. The single exception: a tagged emergency-only project where keys are allowed but require a 2-person approval to issue.
Within 6 weeks we went from "trust the team to do the right thing" to "the platform doesn't permit the wrong thing."
Each WIF token exchange adds ~150ms to a job's startup. For short jobs (< 30s), that's noticeable. We mitigated by reusing the access token within a single job (gcloud config set instead of re-authenticating per command).
Older client libraries fetched a token once and held it. After 1 hour, calls failed. We had to upgrade three Python services to google-cloud-* libraries that auto-refresh.
Every WIF token exchange is logged. For active CI fleets this is a lot of events. We added a sink to BigQuery, now 90% of our audit log queries hit BQ instead of Cloud Logging — significantly cheaper.
Two SaaS tools (a security scanner and a deploy automation tool) only support static keys. We:
gcloud iam service-accounts keys create --validity=72h) on a weekly schedule via Cloud Scheduler. Not perfect but bounded blast radius.constraints/iam.disableServiceAccountKeyCreation at the organization level before you migrate. Otherwise old patterns creep back in.attribute_condition on every WIF provider. Restrict to your repos/accounts.gcloud iam service-accounts keys create.Permission denied errors are generic; you have to enable detailed audit logs to see what claim didn't match.These costs are paid once. The benefits compound forever.
Without question. The single biggest security improvement we shipped this year — and the second-biggest reduction in toil. No more "rotate this key" Jira tickets. No more wondering where a credential file ended up. No more 11-month-old access from former employees.
If you have any service account keys in your org, start with the inventory query above. The number will be larger than you think; the migration will be smaller than you fear.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Three production OOM incidents that taught us how kubelet, containerd, and the kernel actually decide which process dies. With debugging commands you'll wish you had earlier.
We ran Pulumi in TypeScript and Terraform in HCL side by side across 60+ services. Each won different categories of work. Here's the breakdown.
Explore more articles in this category
We deployed the same edge function on both platforms and measured for a quarter. Where each wins, where each loses, and the surprises along the way.
We moved a 60-node production EKS cluster to Auto Mode. Some pain points evaporated, others got harder. The cost picture is more nuanced than the marketing suggests.
We replaced 14 long-lived IAM users with SSO + temporary credentials. The migration plan, the gotchas, and the policies we now enforce.