kubernetes: After cluster upgrade from 1.18.9 to 1.19.7 azureFile volume secrets are searched in wrong namespace
What happened:
After a cluster upgrade from 1.18.9 to 1.19.7 azureFile volume secrets are searched in wrong namespace.
We have a few pods/deployments that are mounting a volume that points to an azure file share. These pods are in various different namespaces. The volumes are directly specified in the deployment spec, e.g. like this:
volumes:
- name: db-backups
azureFile:
secretName: azure-file-secret
shareName: db-backups
readOnly: false
In the same namespace as the deployment we have the secret azure-file-secret. As of Kubernetes 1.18.9 this worked perfectly. After upgrading to Kubernetes 1.19.7 the pods failed to start. Looking at the events we saw messages like this:
MountVolume.SetUp failed for volume “db-backups” : Couldn’t get secret default/azure-file-secret
As you can see, the volume is searching for the secret in the default namespace instead of the namespace of the deployment. Creating the secret in the default namespace allowed the pods to successfully start.
What you expected to happen:
The pods should have used the secret in the same namespace as they did in 1.18.9.
How to reproduce it (as minimally and precisely as possible):
- create a cluster in AKS
- create a new namespace “bug”
- create an azure file share
- create a secret with the required fields in the “bug” namespace (e.g. like in this example)
- create a pod with an
azureFilevolume in the “bug” namespace (e.g. like in this example) - see the pod fail to start
Anything else we need to know?:
We saw in the documentation and examples that the alternative approach of manually creating a PVC and PV for the file share would allow specifying the secretNamespace. Maybe this field also needs to be allowed when specifying the volume directly in a pod/deployment spec?
Environment:
- Kubernetes version: 1.19.7
- Cloud provider or hardware configuration: AKS
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 9
- Comments: 41 (23 by maintainers)
@andyzhangx While it would probably be possible to use that approach, the other one with declaring the volume directly in the pod spec is still mentioned in the docs/examples, so I would expect it to continue working. And it would take quite some time (which I don’t have right now) to migrate all our deployments to the alternative approach.
As one comment in the linked issue mentioned, it is totally unacceptable to introduce a breaking change like this in a minor (or was it even only a patch?) update. I understand the previous behavior was considered a bug, but unfortunately quite a few people already depend on that buggy behavior. In any case, it should be explicitly mentioned in the changelog as a breaking change and optimally the “inline declaration in pod spec” approach should be properly deprecated if you prefer people to use the explicit PV approach.
We were also hit by this one. While we have worked around the problem I think it should be fixed. There are many real-world deployments that can run into this.
@MrWolfZ could you use the workaround mentioned here: https://github.com/Azure/AKS/issues/2108#issuecomment-776672612
@galiacheng the fix is still rolling out, you could run following command to disable csi migration migration on 1.22, that could mitigate the issue:
@StevenJDH are you referring to this issue? https://github.com/Azure/AKS/issues/2871
@fire2 thanks for the catch, the doc is outdated, I will update that doc. from 1.21.0, this driver would search for the secret in the same namespace as the pod
@ramondeklein I just discovered a few days ago that the new AKS versions are now available: