onedrive-api-docs: Files are corrupted sometimes when uploaded with multipart uploads
Category
- Question
- Documentation issue
- Bug
Expected or Desired Behavior
Multipart uploads complete without corrupting data
Observed Behavior
Sometimes (maybe one time in 20) multipart uploads of a 128MiB file get corrupted.
I know they are corrupted because the SHA1 that onedrive reports is different to the local one.
Downloading the file from the web interface reveals that a portion of it has been replaced with what appears to be base64 encoded data.
Attached is a log which created the corrupted SHA1 - this has a full HTTP header dump in (it just doesn’t have the uploaded data as it is rather large).
For comparison here is a log which was successful
I’ve diffed those logs extensively and I can’t see anything significantly different between them other than IDs and different onedrive endpoints.
Here is what the corrupted data looks like: corruption.txt
This just overwrites a part of the file.
I don’t seem to be able to attach the original and the corrupted file (as downloaded from onedrive web interface) for some reason - I have these and can share them with you if you want.
Here is a hex diff of the original and the corrupted file showing the insertion of the base64 data and the position: hexdiff.diff.txt
Steps to Reproduce
I reproduced this with rclone, uploading 128 MiB files using this script. I haven’t tried to reproduce this with a different tool so this could conceivably be a bug in rclone. I think this is unlikely however since rclone users didn’t used to see this bug, and now with the same version of rclone they are seeing this bug, indicating a change in Onedrive is the cause rather than rclone. I also can think of no conceivable mechanism rclone would write several kilobytes of base 64 encoded data.
Update: this has been reproduced uploading files with the onedrive web interface too (see comments below) so likely affects all onedrive users.
!/bin/bash
size=134217728
destination=TestOneDrive:thrashfiles/
for round in $(seq 100); do
name="test-${round}-${RANDOM}${RANDOM}${RANDOM}.bin"
echo
echo --------------- $(date -Is) - round $round - $name ------------------
echo
dd if=/dev/urandom of=$name bs=1M count=$(($size/1048576+1))
truncate -s $size $name
sha1sum $name
rclone --low-level-retries 1 --retries 1 -vv --dump responses copy "${name}" "${destination}"
error=$?
if [ $error -ne 0 ]; then
echo "ERROR $error on $name"
else
rm $name
rclone -v deletefile "${destination}${name}"
fi
sleep 5
done
This was originally reported on the rclone forum: https://forum.rclone.org/t/onedrive-sync-never-completes-with-sha-1-corrupted-transfer-with-latest-rclone/29884/
[ ]: http://aka.ms/onedrive-api-issues [x]: http://aka.ms/onedrive-api-issues
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 33 (6 by maintainers)
My overnight upload finished this morning with zero errors.
That’s the first time it’s had zero errors in a long time.
First, I want to personally thank you all for the very detailed investigation you performed. Getting issues with this level of detail helps our own investigations immensely. I cannot offer any more details other than we’re still investigating on our side, and I’ll update this issue once there’s something to share.
@ificator presumably there are thousands or millions of corrupted files people have uploaded to OneDrive over the period the problem was active. Will Microsoft be issuing a statement and/or contacting affected users?
I have just repeated the test I did earlier comprising of 100 files sized between 128M and 256M. All files uploaded via the web interface, with no corrupt files.
So not able to reproduce the problem so far.
I will initiate another larger transfer of MP4 files using rclone now. This is a transfer that I’ve not been able to complete for the last few days, so I will see if this now works and report back.