rclone: webdav: Add support for dCache checksums/digests/hashes

The associated forum post URL from https://forum.rclone.org

What is your current rclone version (output from rclone version)?

What problem are you are trying to solve?

Support for the hash/checksum/digest methods for files stored in the dCache storage system (and probably others).

End-to-end checksumming is a much wanted feature when dealing with large amounts of data.

How do you think rclone should be changed to solve that?

rclone webdav has some checksum support according to https://rclone.org/overview/ but this seems to be specific to Nextcloud/Owncloud.

The WebDAV chapter of the dCache User Guide documents the various ways dCache supports checksums via webdav:

  1. Content-MD5 header
  2. RFC 3230 digests
  3. WebDAV checksum properties

Items 1) and 2) are really on the HTTP level, ie affecting file transfers.

Item 1) is likely easy to implement, but being limited to MD5 seems like a dead-end today. I’d recommend to ignore it.

Item 2) however would be generic and RFC based, on the other hand I don’t know if there are any other storage systems that supports it (ie it would still be dCache-specific). https://datatracker.ietf.org/doc/html/rfc3230 is the RFC, supplemented with https://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml for the current set of digest/hash/checksum algorithm values.

Item 3) is needed to be able to list checksums of files, it’s likely the closest to the owncloud/nextcloud implementation, with small differences in what’s returned.

A note on checksum types: dCache has been around for a long time, and ADLER32 was adequate for ages (and it’s still the default). File sizes has grown though, and eventually dCache got MD5 and now also SHA(-1)/SHA-256/SHA-512 checksum support. I wouldn’t really expect ADLER32 to be implemented in rclone, even though support for it pops up here and there. But supporting the types already in rclone would be nice.

I’m no Go programmer, otherwise I would likely take a stab at this myself. However, in order to be able to find someone who can take on the task, what would your proposed way of solving this be? What can be reused from what’s already implemented (and how to avoid code duplication)? Would 3) require to create a specific dcache type remote? Can 2) be of use for other purposes than dCache and thus gain merit for implementation?

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don’t comment if you have no relevant information to add. It’s just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 2
  • Comments: 16 (5 by maintainers)

Commits related to this issue

Most upvoted comments

On GET you only have the stored checksums to choose from (see my example directory I posted earlier). You get one type, but in the request you prioritize which one you want, for example Want-Digest: adler32;q=0.1,md5;q=0.2,sha;q=0.3,sha-256;q=0.4,sha-512 would give the strongest available of the (currently) supported types.

In your example it seems to return 2 checksums the adler and the other one you set. Could it return all of the checksums? That would be the easiest to deal with in rclone.

It will only return the ones stored, in practice I expect this to vary depending on system setup and user groups.

You could configure the system to always compute/store all checksum types, but file uploads would be quite expensive cpu-wise so the feasibility would vary between projects.

Alas, with Want-Digest you’ll only get one even if you list multiple with the same priority.

Good luck! The go tour is very good for experienced programmers. I see C in your github repos so you won’t find Go hard - I think of it as C without the hard parts. The main thing that will catch you out is that the declarations are backwards. After you’ve used Go for a while you’ll wonder why the C declarations are so weird!

Thanks, I had missed that one. Go is not the best name for googling info 😃