dvc: Pushing artifacts via WebDAV results in a 411 Length Required response
Bug Report
I am trying to connect to a remote via WebDAV. I can correctly setup user and password along with the url, but when I try to push the artifacts I get a 411 Length Required
response. How can I solve the missing header problem?
Please provide information about your setup
DVC version: 1.9.0 (brew)
Platform: Python 3.9.0 on macOS-10.15.7-x86_64-i386-64bit Supports: azure, gdrive, gs, http, https, s3, ssh, oss, webdav, webdavs Cache types: reflink, hardlink, symlink Repo: dvc, git
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 40 (34 by maintainers)
Commits related to this issue
- remote: avoid chunking on webdav. Fixes #4796 — committed to LucaButera/dvc by deleted user 4 years ago
- remote: avoid chunking on webdav. Fixes #4796 — committed to LucaButera/dvc by deleted user 4 years ago
- remote: avoid chunking on webdav. Fixes #4796 — committed to LucaButera/dvc by deleted user 4 years ago
- remote: avoid chunking on webdav. Fixes #4796 — committed to LucaButera/dvc by deleted user 4 years ago
- remote: avoid chunking on webdav. Fixes #4796 — committed to LucaButera/dvc by deleted user 4 years ago
- Merge branch 'master' of github.com:iterative/dvc into 4504-conf-job-limit * 'master' of github.com:iterative/dvc: dag: add --outs option (#4739) Add test server and tests for webdav (#4827) Si... — committed to I159/dvc by I159 4 years ago
- remote: avoid chunking on webdav. Fixes #4796 (#4828) * remote: avoid chunking on webdav. Fixes #4796 * remote: avoid chunking on webdav. Fixes #4796 * remote: avoid chunking on webdav. Fixes #... — committed to cameronraysmith/dvc by LucaButera 4 years ago
@efiop @LucaButera Can we try to figure out, whether it is really (only) the chunked upload and not something else?
@LucaButera If you have a copy of the dvc repository and some time to try something: It should be quite easy to change the
_upload
method of theWebDAVTree
to use theupload_file
method, which irrc does no chunking of the file.https://github.com/iterative/dvc/blob/master/dvc/tree/webdav.py#L243
You would have to change the last line
self._client.upload_to(buff=chunks(), remote_path=to_info.path)
toself._client.upload_file(local_path=from_file, remote_path=to_info.path)
If this modification lets you upload files, we can be pretty sure it is the chunking or a bug in the webdavclient
upload_to
method. Note that this will disable the progressbar, so it might seem as it is hanging…I assume you have no valid dvc cache at the remote yet (as uploading does not work at all)? So you cannot check whether downloading is working?
Before trying to upload the file, the parent directories should be created e.g.
datasets/a7
, could you please check, whether this was successful?@LucaButera, to see if the chunking upload is the issue, you could also try sending a curl request with chunking upload:
Also, check without that header. If the files are uploaded successfully on both instances, something’s wrong with the library. If it’s just the former, chunking upload might have been forbidden on the server entirely.
@LucaButera, It’d be great if you could make a PR. Thanks. Check contributing-guide for setup.
Maybe, no need of the config, but we can decide that on the PR discussion.
@skshetry it would be wonderful to have a simple solution like that.
On the other hand a more reliable solution like the one of the “assembly on pull” seems also a nice feature in the long run.
I have never contributed to open source projects but I am willing to help if needed, as I think DVC is really a much needed tool.
I’m also facing similar but slightly different issue with “Nextcloud + mod_fcgi” (which is a bug in httpd2), in which files are uploaded empty.
The original issue might be due to that bug (not fixed yet) or, this bug which was only fixed 2 years ago (OP’s server is
2.4.18
, whereas recent one is2.4.46
).Sabredav’s wiki has a good insight into these bugs:
So, the best thing to do is either drop “chunked” requests on PUT or introduce config to disable it.
@efiop, as the webdavclient3 uses streaming upload, we can still support progress bars:
Look here for the change: https://github.com/iterative/dvc/blob/f827d641d5c2f58944e49d2f6537a9ff09e447e1/dvc/tree/webdav.py#L224
The Owncloud Chunking (NG) might be too slow for our use case, as it needs to create a separate request for each chunk (and, then send “MOVE” that joins all the chunk which is again expensive). So, unless we change our upload strategy to parallelize chunking upload rather than file upload, we will make it 3-4x slower, just for the sake of having a progress bar. And, it seems it’s possible to have a progress bar without it. Not to add, it’s not a WebDAV standard, that’s unsupported outside of Nextcloud and Owncloud.
I don’t think, there is any way around timeout errors, especially if we talk about PHP based WebDAV servers (they have a set
max_execution_time
). The Owncloud Chunking NG exists because of this very reason.Though, we could just chunk and upload and then assemble it during
pull
. I think, this is what rclone chunker does.For closing this issue, we could just disable chunking upload via a config or by default.
Uff, yes, did not even think about this yet… You probably not want to adjust the timeout config depending on your expected file size, so chunked transmission is the only solution to avoid timeouts per request.
Then lets think about implementing something like
dvc remote modify <remote> chunked_upload false
(I thinktrue
should be the default). Maybechunked_transfer
or justchunked
would be a better name as this might apply to download as well?@LucaButera, did you try @iksnagreb’s suggestion? If that works, we could provide a config for disabling it.
If that didn’t work, I am afraid there’s no other easy solution than to contact the provider. Nextcloud/Owncloud does support non-standard webdav extension for chunking upload for these kind of situations, but it’s unlikely we are going to support it.
Hm, I do not thinks this is possible right now - at least for the WebDAV remote. It should be possible to implement an option to enable non-chunked upload, the problem I see is: This would also disable the progressbar (without chunking, we cannot count progress…) which is not obvious and might confuse users. @efiop Are there options for disabling chunking for other remotes, if yes, how do these handle that problem?
I think an option for selecting chunked/non-chunked upload could be an configuration option (if we can find a way to handle this conveniently), there are probably other cloud providers disallowing chunked upload as well…
The server is a Switch Drive, which is a cloud storage provider based on ownCloud. I would assume the WebDAV server is the same as ownCloud, but I don’t have further info