vsphere-csi-driver: CnsFault error: VSLM task failed

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened: I am a novice, I followed the document smoothly to this step

https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-606E179E-4856-484C-8619-773848175396.html

Then an error was reported when creating the PV.

{“level”:“info”,“time”:“2021-11-12T19:42:02.055011665Z”,“caller”:“volume/manager.go:407”,“msg”:“CreateVolume: VolumeName: "pvc-bca23169-66d8-4685-b045-1fd47397e619", opId: "06ed1681"”,“TraceId”:“353215b1-a18c-4432-a454-a5994d1b3ee9”} {“level”:“info”,“time”:“2021-11-12T19:42:02.055060708Z”,“caller”:“volume/util.go:364”,“msg”:“Extract vimfault type: +types.CnsFault vimFault: +{<nil> VSLM task failed} Fault: &{DynamicData:{} Fault:{BaseMethodFault:<nil> Reason:VSLM task failed} LocalizedMessage:CnsFault error: VSLM task failed} from resp: +&{{} {{} } 0xc000701e80}”,“TraceId”:“353215b1-a18c-4432-a454-a5994d1b3ee9”}

{"level":"error","time":"2021-11-12T19:42:02.055090708Z","caller":"volume/util.go:291","msg":"failed to create volume with fault: \"(*types.LocalizedMethodFault)(0xc000701e80)({\\n DynamicData: (types.DynamicData) {\\n },\\n Fault: (types.CnsFault) {\\n BaseMethodFault: (types.BaseMethodFault) <nil>,\\n Reason: (string) (len=16) \\\"VSLM task failed\\\"\\n },\\n LocalizedMessage: (string) (len=32) \\\"CnsFault error: VSLM task failed\\\"\\n})\\n\"","TraceId":"353215b1-a18c-4432-a454-a5994d1b3ee9","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-lib/volume.validateCreateVolumeResponseFault\n\t/build/pkg/common/cns-lib/volume/util.go:291\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-lib/volume.(*defaultManager).createVolumeWithImprovedIdempotency\n\t/build/pkg/common/cns-lib/volume/manager.go:424\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-lib/volume.(*defaultManager).CreateVolume.func1\n\t/build/pkg/common/cns-lib/volume/manager.go:567\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-lib/volume.(*defaultManager).CreateVolume\n\t/build/pkg/common/cns-lib/volume/manager.go:572\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common.CreateBlockVolumeUtil\n\t/build/pkg/csi/service/common/vsphereutil.go:242\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).createBlockVolume\n\t/build/pkg/csi/service/vanilla/controller.go:541\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:830\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume\n\t/build/pkg/csi/service/vanilla/controller.go:832\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/spec@v1.4.0/lib/go/csi/csi.pb.go:5589\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:722"} {"level":"error","time":"2021-11-12T19:42:02.059953685Z","caller":"common/vsphereutil.go:244","msg":"failed to create disk pvc-bca23169-66d8-4685-b045-1fd47397e619 with error failed to create volume with fault: \"(*types.LocalizedMethodFault)(0xc000701e80)({\\n DynamicData: (types.DynamicData) {\\n },\\n Fault: (types.CnsFault) {\\n BaseMethodFault: (types.BaseMethodFault) <nil>,\\n Reason: (string) (len=16) \\\"VSLM task failed\\\"\\n },\\n LocalizedMessage: (string) (len=32) \\\"CnsFault error: VSLM task failed\\\"\\n})\\n\" faultType \"vim.fault.CnsFault\"","TraceId":"353215b1-a18c-4432-a454-a5994d1b3ee9","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common.CreateBlockVolumeUtil\n\t/build/pkg/csi/service/common/vsphereutil.go:244\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).createBlockVolume\n\t/build/pkg/csi/service/vanilla/controller.go:541\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:830\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume\n\t/build/pkg/csi/service/vanilla/controller.go:832\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/spec@v1.4.0/lib/go/csi/csi.pb.go:5589\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:722"} {"level":"error","time":"2021-11-12T19:42:02.060009157Z","caller":"vanilla/controller.go:544","msg":"failed to create volume. Error: failed to create volume with fault: \"(*types.LocalizedMethodFault)(0xc000701e80)({\\n DynamicData: (types.DynamicData) {\\n },\\n Fault: (types.CnsFault) {\\n BaseMethodFault: (types.BaseMethodFault) <nil>,\\n Reason: (string) (len=16) \\\"VSLM task failed\\\"\\n },\\n LocalizedMessage: (string) (len=32) \\\"CnsFault error: VSLM task failed\\\"\\n})\\n\"","TraceId":"353215b1-a18c-4432-a454-a5994d1b3ee9","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).createBlockVolume\n\t/build/pkg/csi/service/vanilla/controller.go:544\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:830\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).CreateVolume\n\t/build/pkg/csi/service/vanilla/controller.go:832\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/spec@v1.4.0/lib/go/csi/csi.pb.go:5589\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.27.1/server.go:722"}

What you expected to happen:

How to reproduce it (as minimally and precisely as possible): I just used it to test the environment, I used a single node to build a VSAN

Anything else we need to know?:

Environment:

  • csi-vsphere version:

  • vsphere-cloud-controller-manager version: image: gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.21.1

  • Kubernetes version: 1.21.5

  • vSphere version: vsphere version 7.0.3.00100

  • OS (e.g. from /etc/os-release): photon os 4.0

  • Kernel (e.g. uname -a): 5.10

  • Install tools:

  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 20 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Enabling Changed Block Tracking (CBT) did the trick for us too. Thanks a bunch.

Solution: Adding ctkEnabled = "TRUE" flag in the Rancher node template fixes it for all the new nodes.

Hi all,

VMware support solved/workaround this issue by manually enabling Changed Block Tracking (CBT) on our existing worker nodes. https://kb.vmware.com/s/article/1020128 "In some cases, such as a power failure or hard shutdown while virtual machines are powered on, CBT might reset and lose track of incremental changes."

As per the VMware support CBT sometimes resets (disables) in case of power failures or hard shutdown of powered on VMs. In our case, we had a few PSODs on our ESXi caused by the issues of ESXi 7.0 Update 3b.

I hope this helps @Moezenka @torbendury @lakxtxue @jwhb

@ThoSap this also works for us, as it seems

I am experiencing the same problem and i have opened a ticket with VMware.

I was told that the engineering team suspects that this is a vCenter issue and that it should be solved in the coming versions.

I had this issue too and the second resolution works like a charm in my case : https://kb.vmware.com/s/article/88193

Disable Changed Block Tracking on the First Class Disk. To do this perform the following steps

1. Go to https://<vc_fqdn>/mob/?moid=VStorageObjectManager&method=clearVStorageObjectControlFlags
2. Provide the Volume id in the id field along with the datastore ManagedObjectReference in the datastore field
3. Provide enableChangedBlockTracking in the controlFlags field as follows
<controlFlags>enableChangedBlockTracking</controlFlags>
4. Then click on Invoke Method
5. Recheck the FCD changedBlockTrackingEnabled setting via MOB again to ensure it now shows as false as outlined in the Symptoms section above.

I think people who had success with setting ctkEnabled=TRUE were dealing with the issue described here: https://kb.vmware.com/s/article/88193. In short, attaching an FCD with CBT enabled to a VM with CBT disabled won’t work.

In my case, I found this issue after getting the same “VSLM task failed” error in the output of kubectl get events. Unfortunately In my case setting ctkEnabled=TRUE didn’t fix the issue. What did work for me was a re-scan of storage in my ESXi cluster (in inventory, right click on the ESXi cluster -> storage -> rescan storage).

I strongly suggest anyone stumbling on this issue because of a “VSLM task failed” error first try a storage rescan, and bear in mind that if you do decide to try setting ctkEnabled=TRUE, disabling it again requires manual work for each attached FCD as detailed in the KB link above.