rclone: upstream XML decoding bug.. command rclone info has broken s3 remote

Output of rclone version

rclone v1.48.0

  • os/arch: linux/amd64
  • go version: go1.12.6

Ubuntu 18.04.2 LTS

Describe the issue

My setup is S3-Compatable (Scaleway) remote >> Crypt.

After unmounting the crypt remote “encloud” using “fusermount -u”, I then manually cleared the vfs cache folder, and remounted with --cache-db-purge option. But then I was unable to list the directory or perform any other action on that remote, even when mounting with bypassing the crypt remote to see the encrypted files. Object storage crashed. The Scaleway support had to apply a hotfix for this bug at their end and had coprehensively explained the issue.

Scaleway team replicated the same situation on Amazon S3 and got the same error from rclone.

Mounting string: rclone mount encloud: /mnt/s3 --allow-other --contimeout 10m --max-read-ahead 64M --log-file "/var/www/rclone.log" --log-level NOTICE --fast-list --transfers 16 --buffer-size 64M --multi-thread-cutoff 64M --multi-thread-streams 16 --poll-interval 0 --vfs-cache-mode full --dir-cache-time 2160h --vfs-cache-max-age 720h --vfs-cache-max-size 10G --vfs-read-chunk-size 64M --vfs-read-chunk-size-limit 128M --retries 200 --retries-sleep 1s --no-gzip-encoding --stats 24h --stats-one-line --gid 33 --uid 33 --umask 007 --daemon

Here is the rclone config: `[cloud] type = s3 provider = Other access_key_id = *** secret_access_key = *** region = nl-ams endpoint = s3.nl-ams.scw.cloud acl = private bucket_acl = private upload_cutoff = 256M chunk_size = 256M upload_concurrency = 8 force_path_style = false

[encloud] type = crypt remote = cloud:/album/cloud filename_encryption = standard directory_name_encryption = true password = *** password2 = ***`

Scaleway Team response to he issue:

Hello,

First of all, thank you for your report, we did have an encoding bug on our end, which is now fixed. The problem you are now having is on the rclone’s side, and there is nothing we can do about it.

Rclone did upload object with control characters in the names. For example:

data/-position-left-05

Which is:

00000000  64 61 74 61 2f 05 2d 70  6f 73 69 74 69 6f 6e 2d  |data/.-position-|
00000010  6c 65 66 74 2d 30 35 0a                           |left-05.

Notice the 0x05 just before the ‘-’.

Now, the first time you reported the bug to us, the gateway was crashing on listing, because our underlying encoding library was panicking on characters such as this one. We now have the same comportement as Amazon, which is introducing the characters in the XML: “&#x05”.

However, in this specific case, the 1.0 XML parsers are not compatible with such characters, that is why rclone is failing on mount (The same bug is happening to our console right now, which is why you cannot list via the web interface).

Amazon specifies the ?encoding-type=url parameter for such cases. From Amazon’s documentation[1]:

Param: encoding-type
Description: Requests Amazon S3 to encode the response and specifies the
encoding method to use.

An object key can contain any Unicode character. However, XML 1.0 parsers
cannot parse some characters, such as characters with an ASCII value from 0 to
10. For characters that are not supported in XML 1.0, you can add this
parameter to request that Amazon S3 encode the keys in the response.

Type: String
Default: None
Valid value: url

We did test on the Amazon S3 gateway, with those objects names, and rclone fails the same way. This is why I believe the rclone s3 crypt implementation is broken in this way, since it uploads special object names, without specifying the encoding type to url on listing.

In the meantime, I suggest you use another tool to access your data. Another workaround will be to list your files with aws s3 ls (which is using the encoding-type=url) and spot the files that have control characters in their names. You can attempt to delete them in order to be able to mount your bucket with rclone.

In the hope I have answered all your questions,

[1] https://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html

Cordialement / Best regards,

Pierre-Antoine PAGANELLI

Customer Success Specialist Advanced

Rclone logged error before hotfix at scaleway: 2019/07/09 17:34:17 ERROR : /: Dir.Stat error: InternalError: We encountered an internal error. Please try again. status code: 500, request id: tx795027a68d2d416f8d704-005d24b3f8, host id: tx795027a68d2d416f8d704-005d24b3f8

Rclone logged error after hotfix at scaleway: 2019/07/11 19:55:10 ERROR : /: Dir.Stat error: SerializationError: failed to decode REST XML response status code: 200, request id: txc7bdb342b42a43faa97f8-005d2777fd caused by: XML syntax error on line 2: illegal character code U+0001

This issue cause a lot of issues for us because this set up is running in production environment… At this moment I didn’t try to interact with the storage directly through the native scaleway’s cli and just restored the backup to a new bucket.

This doesn’t look production ready,… Any thoughts ?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 25 (13 by maintainers)

Commits related to this issue

Most upvoted comments

I’m stupid for not thinking about this earlier but it appears that you ran the rclone info command on your s3 backend. This command is specifically for testing which characters a backend supports including and the file -position-left-05 is definetly created by this command. While unfortunate it is somewhat expected for this command to break certain backends so this isn’t really a bug. However the XML encoding decoding bug in the AWS library still stands.

Thx! I tested it and I like the improvements. Unfortunately, IBM still hasn’t enabled the encoding-type param on their side so I’ll have to keep working with them to resolve that.

Thanks again for making this update! I’ll be sure to grab v.1.50. when it becomes available.

Regards, John


Official Intervals help site help.myintervals.com

The Intervals blog: www.myintervals.com/blog/http://www.myintervals.com/blog/


From: Nick Craig-Wood notifications@github.com Sent: Tuesday, October 15, 2019 11:56 AM To: rclone/rclone rclone@noreply.github.com Cc: John Reeve jreeve@myintervals.com; Mention mention@noreply.github.com Subject: Re: [rclone/rclone] upstream XML decoding bug… command rclone info has broken s3 remote (#3345)

You’ll need the latest betahttps://beta.rclone.org for that. This is shortly to become v1.50.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/rclone/rclone/issues/3345?email_source=notifications&email_token=AA7ABO4NUN33EGFJBRR2FMDQOYG35A5CNFSM4IC3GEGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJ3A2A#issuecomment-542355560, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7ABO7DUAOPV33OVFOBIELQOYG35ANCNFSM4IC3GEGA.

Thank you for your assistance in troubleshooting this issue! I’m glad we were able to isolate the problem. I have started a ticket with IBM and I’ll keep you posted as to what they say. I’m glad to hear that this will help with your strategy. And I like your idea of doing a retry on the XML Syntax error.