kubernetes: CSI E2E tests: fail with upcoming CSI release

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

When testing k8s master with the “canary” CSI images (= the images which will be tagged as the next release end of this week), the CSI volume tests fail.

PVCs remain pending because the external-attacher runs into a permission issue (from its log):

E0919 08:34:56.218106       1 leaderelection.go:224] error retrieving resource lock e2e-tests-csi-mock-plugin-l84wg/csi-hostpath: endpoints "csi-hostpath" is forbidden: User "system:serviceaccount:e2e-tests-csi-mock-plugin-l84wg:csi-hostpath-service-account" cannot get resource "endpoints" in API group "" in the namespace "e2e-tests-csi-mock-plugin-l84wg"
I0919 08:34:56.218136       1 leaderelection.go:180] failed to acquire lease e2e-tests-csi-mock-plugin-l84wg/csi-hostpath

What you expected to happen:

Should work?

How to reproduce it (as minimally and precisely as possible):

  • build k8s master
  • start cluster: RUNTIME_CONFIG= ALLOW_PRIVILEGED=1 FEATURE_GATES="BlockVolume=true,MountPropagation=true,KubeletPluginsWatcher=true" hack/local-up-cluster.sh -O
  • run tests: make WHAT=test/e2e/e2e.test && go run hack/e2e.go -- --provider=local --test --test_args="--ginkgo.focus=CSI.plugin.test.using.CSI.driver:.hostPath -csiImageVersion=canary"

Anything else we need to know?:

It works with the latest released versions of the CSI containers (i.e. without -csiImageVersion=canary). That is how this test currently runs in the k8s CI.

If new permissions are needed, then https://kubernetes-csi.github.io/docs/Example.html also needs to be updated.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 30 (30 by maintainers)

Commits related to this issue

Most upvoted comments

Hi all, I noticed that the conversation kind of floated between this issue and 2 other PR’s so in an effort to understand what was going on and what was decided I made a little summary that I will share here. Let me know if there is any misunderstanding or omission and I can edit this comment.

Assumptions:

  1. Each driver may require a different set of cluster roles to function
  2. Tests should be able to run in parallel
  3. Cluster roles should be easy(ish) to patch
  4. ClusterRoles in production deployment and test should not differ

Conclusions:

  1. We should use manifest yamls instead of bootstrapped roles or roles in code (1) (3) (https://github.com/kubernetes/kubernetes/pull/68821#issuecomment-423252660)
  2. We should deprecate the bootstrapped cluster roles ASAP (3) (1) (https://github.com/kubernetes/kubernetes/issues/68819#issuecomment-422834929)
  3. Cluster Role names should be unique, potentially even tagging UUID’s onto the end of the names when creating from manifests (2) (1)
  4. Ideally some sort of sync/import of the manifests between the external repo (source of truth) and the test manifest (4)

David Zhu notifications@github.com writes:

Assumptions:

  1. Each driver may require a different set of cluster roles to function
  2. Tests should be able to run in parallel
  3. Cluster roles should be easy(ish) to patch
  4. ClusterRoles in production deployment and test should not differ

Conclusions:

  1. We should use manifest yamls instead of bootstrapped roles or roles in code (1) (3) (https://github.com/kubernetes/kubernetes/pull/68821#issuecomment-423252660)
  2. We should deprecate the bootstrapped cluster roles ASAP (3) (1) (https://github.com/kubernetes/kubernetes/issues/68819#issuecomment-422834929)
  3. Cluster Role names should be unique, potentially even tagging UUID’s onto the end of the names when creating from manifests (2) (1)
  4. Ideally some sort of sync/import of the manifests between the external repo (source of truth) and the test manifest (4)

I agree that this is a good direction. Both PRs address a subset of this (Jan’s starts to use manifests, mine drops usage of the boostrapped cluster roles). I’m fine with merging PR #68887 first and then continuing the work based on that.

@wongma7, that means new external-storage release and rebase of external-provisioner, right? Do we still have time for that in 1.12 or shall we stick to endpoints there and move everything to Lease in 1.13? It’s ugly, but perhaps it’s better than rushed releases.

Yes it means another release. Actually I didn’t know about this Lease object so, I think if we are going to move to it in 1.13 anyway let’s keep external-provisioner on endpoints for 1 release and avoid rushing yet another release within 1.12 timeframe. Waiting until 1.13 will be easier to communicate and easier for users to stomach, they can update external-attacher+external-provisioner in one go. So let’s stick to endpoints for now, just 1 release. 👍