etcd: ETCD fails to come up when using CIFS file share (as etcd storage) and with cache=none in mount option

What happened?

I had etcd:3.4.15-debian-10-r43 running and mounted a cifs file sharing with below mount option. Please note cache=none

rw,relatime,vers=3.1.1,cache=none,username=myname,uid=0,noforceuid,gid=1001,forcegid,addr=1.10.90.14,file_mode=0777,dir_mode=0777,soft,persistenthandles,nounix,serverino,mapposix,mfsymlinks,rsize=1048576,wsize=1048576,bsize=1048576,echo_interval=60,actimeo=30

The etcd is failing to comeup with below panic

Since per discussion in https://github.com/etcd-io/etcd/issues/9670 , I tried cache=none for cifs. With same mount option, if I make it cache=strict, everything works fine.

What did you expect to happen?

Etcd must not panic

How can we reproduce it (as minimally and precisely as possible)?

Its very straightforward with the given mount option

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

etcdctl version: 3.4.15 API version: 3.4

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

In the screenshot

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 23 (7 by maintainers)

Most upvoted comments

Just FYI

https://bugzilla.kernel.org/show_bug.cgi?id=216301

hasethuraman on Sep 12, 2022

Thank you @ahrtr for sharing. I think that confirms.

I am planning to close the issue because WAL cover up the main reason (data loss) and trying to disable the cache. I will wait for a week and close. Please let me know if any one think that I need to keep this open.

@hasethuraman Have you tried this? What’s the test result?

Its the same

hasethuraman on Aug 4, 2022

@hasethuraman @tjungblu could you try to reproduce this issue with etcd 3.5.4?

ahrtr on Jul 27, 2022

hey @hasethuraman,

I’ve been looking a bit into this today, since I have a samba share at home and wanted to debug bbolt a little. I don’t have a fix for you just yet, but I just want to mentally write down what I found so far.

The panic can be easily reproduced by running this simple test case on the share: https://github.com/etcd-io/bbolt/blob/master/manydbs_test.go#L33-L60

The panic is caused by a bucket that is not found after it was created in this method: https://github.com/etcd-io/bbolt/blob/master/bucket.go#L165-L183

The cursor seek does not return an existing bucket for the root node, thus it will create a new one with the same page id, which causes the panic on the freelist.

tjungblu on Jul 26, 2022