seaweedfs: Corrupted Files in Cluster

Describe the bug Files copied to FUSE filer mount are being corrupted.

System Setup

  • OS version Debian GNU/Linux 11 (bullseye)
  • output of weed version version 8000GB 3.11 d4ef06cdcf320f8b8b17279586e0738894869eff linux amd64
[filer.options]
# with http DELETE, by default the filer would check whether a folder is empty.
# recursive_delete will delete all sub folders and files, similar to "rm -Rf"
recursive_delete = true

[leveldb3]
# similar to leveldb2.
# each bucket has its own meta store.
enabled = true
dir = "/opt/seaweedfs/REDACTED/filer/filerldb3"                    # directory to store level db files

Expected behavior Files copied via FUSE mount should match md5sum like any other mount.

Additional context fstab entry:

fuse /mnt/swfs/localshared fuse.weed filer='REDACTED:8889,REDACTED:8889,REDACTED:8889',filer.path=/,nofail 0 0
root:~# md5sum /var/lib/pve/local-btrfs/images/526/vm-526-disk-0/disk.raw 
fba3db35d8d631a8014bbf89dbb3e9ca  /var/lib/pve/local-btrfs/images/526/vm-526-disk-0/disk.raw
root:~# cp /var/lib/pve/local-btrfs/images/526/vm-526-disk-0/disk.raw /mnt/swfs/localshared/
root:~# md5sum /mnt/swfs/localshared/disk.raw 
f8555d7c8bc4e0081dde769255ddc815  /mnt/swfs/localshared/disk.raw

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 40 (24 by maintainers)

Most upvoted comments

@xdadrm I think that is the right path. Looks like my filers are out of sync. Going to try backing up and retrying. Not sure why or how they got out of sync.

That would explain why the problem was being very elusive and hard to reproduce.

I’ve done extensive testing of the leveldb filer stores and I’ve always had eventual consistency issues at higher request rates, caused by them getting out of sync. (In this case, they were backlogged due to high request rate)

The solution for me was to use a single filer store, MySQL in my case, and have multiple filer instances in front of it. With this configuration, all filer instances are stateless as they use a shared filer store.

@xdadrm I think that is the right path. Looks like my filers are out of sync. Going to try backing up and retrying. Not sure why or how they got out of sync.

That would explain why the problem was being very elusive and hard to reproduce.

@chrislusf reading through the docs, couldn’t find a definitive way to manually sync them. The filter docs state adding and removing should just work. Any recommendations or guidance?

Just a thought.

I have experienced ‘changing checksums’ as well while running multiple filers, a few versions back It appeared to be filer’s Metadata getting out of sync. Fixes are mentioned for metadata-sync in recent versions, but I haven’t found the courage to try again [yet] 😃

For me the issue went away when I switched to a single filer. Might be worthwhile to switch to a single filer and check if checksums stabilize then.