velero: "Error getting backup store for this location"

What steps did you take and what happened:

Note: I’m deploying Velero in a GKE private cluster. Images for velero and velero-plugin-for-gcp have been copied over to the internal GCR repo. We’re also using WorkloadIdentity.

velero install --image gcr.io/foo/velero:v1.2.0 --provider gcp --plugins gcr.io/foo/velero-plugin-for-gcp:v1.0.0 --bucket $BUCKET --no-secret --sa-annotations iam.gke.io/gcp-service-account=velero@foo.iam.gserviceaccount.com --backup-location-config serviceAccount=velero@foo.iam.gserviceaccount.com

Results in the following error, and the server halts:

An error occurred: some backup storage locations are invalid: error getting backup store for location "default": unable to locate ObjectStore plugin named velero.io/gcp

More details:

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  creationTimestamp: 2020-02-24T04:44:56Z
  generation: 1
  labels:
    component: velero
  name: default
  namespace: velero
  resourceVersion: "3283695"
  selfLink: /apis/velero.io/v1/namespaces/velero/backupstoragelocations/default
  uid: 68071597-56c0-11ea-91f8-4201ac107009
spec:
  config:
    serviceAccount: velero@foo.iam.gserviceaccount.com
  objectStorage:
    bucket: <foo_bucket>
  provider: gcp
status: {}

What did you expect to happen: Velero up and running ready to receive other instructions.

The output of the following commands will help us better understand what’s going on:

Anything else you would like to add:

Deleting the default BackupStoreLocation and replacing it with the same definition named as gcp fixes the reported issue, however, new error appear later on:

$ kubectl -n velero delete backupstoragelocation default`

$ kubectl apply -f <(echo -n "
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  labels:
    component: velero
  name: gcp
  namespace: velero
spec:
  config:
    name: gcp
    serviceAccount: velero@foo.iam.gserviceaccount.com
  objectStorage:
    bucket: <foo_bucket>
  provider: velero.io/gcp")
backupstoragelocation.velero.io/gcp unchanged

Server starts but other errors appear:

time="2020-02-27T08:40:25Z" level=error msg="Error getting backup store for this location" backupLocation=gcp controller=backup-sync error="unable to locate ObjectStore plugin named velero.io/gcp" logSource="pkg/controller/backup_sync_controller.go:167"

Environment:

  • Velero version (use velero version):
Client:
	Version: v1.2.0
	Git commit: 5d008491bbf681658d3e372da1a9d3a21ca4c03c
Server:
	Version: v1.2.0
  • Velero features (use velero client config get features):
features: <NOT SET>
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.9-gke.9", GitCommit:"a9973cbb2722793e2ea08d20880633ca61d3e669", GitTreeState:"clean", BuildDate:"2020-02-07T22:35:02Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version: 1.15.9-gke.9

  • Cloud provider or hardware configuration: GKE

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (7 by maintainers)

Most upvoted comments

Hey folks, I found the issue (and workaround):

DNS resolution within the pod wasn’t working:

nobody@velero-599bf9ff5d-lgtpd:/$ getent hosts metadata.google.internal

I spinned up the workload-identity-test following GCP documetation to test WorkloadIdentity:

$ kubectl -n velero exec  workload-identity-test -it -- /bin/bash

root@workload-identity-test:/# curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token
curl: (6) Could not resolve host: metadata.google.internal

So, I patched the Velero Deployment config, changing the dnsPolicy (which was ClusterFirst):

$ kubectl -n velero patch deployment velero -p '{"spec":{"template":{"spec":{"dnsPolicy": "Default"}}}}'

Which fixed the name resolution:

nobody@velero-7b87568cdc-rtsl9:/$ getent hosts metadata.google.internal
169.254.169.254 metadata.google.internal

After this, velero started working successfully using WorkloadIdentity

Thanks @ashish-amarnath! No worries at all 😃

We use Anthos to manage the objects in GKE. Therefore we use velero install ... --dry-run -o yaml to generate this file: https://gist.github.com/guilledipa/25a64d86bedf8c2364f28db302c707da (which is enforced by Anthos).

The command is:

velero install --image gcr.io/foo/velero:v1.2.0 --provider gcp --plugins gcr.io/foo/velero-plugin-for-gcp:v1.0.0 --bucket velero-foo --no-secret --sa-annotations iam.gke.io/gcp-service-account=velero@foo.iam.gserviceaccount.com --backup-location-config serviceAccount=velero@foo.iam.gserviceaccount.com --dry-run -o yaml

gcr.io/foo/velero-plugin-for-gcp:v1.0.0 is just a copy of velero/velero-plugin-for-gcp:v1.0.0 pushed into our own GCR :

$ docker pull velero/velero-plugin-for-gcp:v1.0.0
$ docker tag<ID> gcr.io/foo/velero-plugin-for-gcp:v1.0.0
$ docker push gcr.io/foo/velero-plugin-for-gcp:v1.0.0

I’ll verify the health of these images and report back

Thanks very much!