velero: Using CMEK on GKE Velero does not seem to encrypt the restored cluster with correct key

I am running Velero on GKE.

I do have 2 clusters, and I am using 2x StorageClass on each one. This one of the first cluster (redacted):

name: csi-gce-pd-cmek
  resourceVersion: "39305929"
  uid: ba3246c2-ead1-4acf-94a5-560b76167b80
parameters:
  disk-encryption-kms-key: projects/myprj/locations/europe-west1/keyRings/keyring-test/cryptoKeys/key-keyring-test
  type: pd-standard
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

and this one on the second cluster:

  name: csi-gce-pd-cmek-secondary
  resourceVersion: "2157798"
  uid: 17bb8a72-fe80-44c9-b2c3-f1002db72270
parameters:
  disk-encryption-kms-key: projects/myprj/locations/europe-west1/keyRings/keyring-secondary-test/cryptoKeys/key-secondary-test
  type: pd-standard
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

I have create 2x GCP CMEK keys as you can see above, and passed to the StorageClass (param: disk-encryption-kms-key).

I run a simple workload (ngnix) on the first cluster using a PVC. PVC is bound to a PV with first StorageClass. I can write data in the PV.

Then I make Velero backup on the primary and restore on the secondary. Before the restore on the second cluster I also apply this configMap to map the two StorageClasses:

apiVersion: v1
kind: ConfigMap
metadata:
  name: change-storage-class-config
  namespace: velero
  labels:
    velero.io/plugin-config: ""
data:
  csi-gce-pd-cmek: csi-gce-pd-cmek-secondary

At the end of the restore process I can correctly see data restored on my secondary cluster.

My problem is the GCP CMEK though!

When I look in the disk on the node where ngnix run on the first cluster I see that the disk is encrypted with Customer managed key. image

Whereas when I look in the disk on the node where ngnix run on the second cluster I see that the disk is encrypted with Google managed key (not CMEK!!). image

I am surprised since PVC, PV and StorageClasses all seems correct. Their yaml files and their status all look fine. Data are available but I suspect that something is wrong because GCP is not showing that the second cluster’node disk is encrypted with CMEK (key-secondary-test).

Also another parameter supporting my analysis is the following:

Protected Resource is 0: image

Whereas on the first cluster this metric is not 0: image

Is it a bug of Velero or why is this happening?

Environment:

  • velero version Client: Version: v1.12.0 Git commit: 7112c62e493b0f7570f0e7cd2088f8cad968db99 Server: Version: v1.11.1
  • kubectl version Client Version: v1.28.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.27.3-gke.100
  • GKE
  • Container-Optimized OS from Google

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Comments: 20 (10 by maintainers)

Most upvoted comments

@sseago @reasonerjt @Lyndon-Li Could you share some insight on this issue?

The scenario is the volume that is protected by CMEK backed up, then is restored with a StorageClass with a different CMEK. Unexpectedly the restored volume is not protected by the CMEK specified by the restore StorageClass, it’s protected by the Google-managed key. The reason is analyzed here https://github.com/vmware-tanzu/velero/issues/6982#issuecomment-1782578207.

Looks like this is a common issue for all Velero cloud-provider plugins.

Is it possible to have a temporary solution without modifying the VolumeSnapshotter plugin interface?

@smoms Yes. I agree that the restored volumes not using the StorageClass specified CMEK is not expected. Please see my previous comment. Adding new parameters in the Velero plugin’s interface will make the new version Velero server not work with the older version of Velero plugins. I think we need to see the other maintainers’ opinions on how to fix it, and whether the break change is acceptable.

Second, if we want to fix this issue by specifying the CMEK while creating the volume from the snapshot, we need to modify the VolumeSnapshotter interface. encryptionKey should be added to the function CreateVolumeFromSnapshot input parameter list. This may be a common issue for all Velero cloud provider plugins.