google-cloud-node: Slow Storage uploads and high CPU usage

I’m seeing very slow uploads when sending large numbers of medium-sized files to storage - in the order of it taking 10x the time compared to uploading to S3. I also see excessive CPU usage during this time.

This Gist is a repro in the form of a side-by-side comparison uploading the same set of 750 medium-sized random text files to S3 using knox and to GCS using gcloud-node

gsutil can upload the same number of files just as quickly as anything to S3, so I’m pretty sure it’s not the service itself.

I’ve tried limiting the request module’s default maxSockets to 5 or 10 for the request pool but that didn’t seem to help. I have a hunch it’s the sheer number of outstanding requests or streams that’s causing node to spin its wheels and that maybe some form of global queue could fix it, but I haven’t been able to validate that yet.

Any help getting this into a usable state would be much appreciated

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 22 (18 by maintainers)

Commits related to this issue

Most upvoted comments

I’m getting consistent 60 seconds for uploads when using googleapis library (10 uploads max parallel). I’m seeing slightly better performance (between 55 and 60s uploads) from the gcloud-node library surprisingly but only when resumable is false. If resumable is true, it’s super super slow and actually is making my fan start on my computer which never happens. CPU is at 99%. The entire upload took a whopping 487 seconds… 😦 Knox ran in 42-46 seconds.

I may not have a fix or a way to drastically improve performance, but that doesn’t mean I think this is fixed. Turning off validation may have gotten some significant performance gains; it was 5x faster on my setup without it, however, that’s still half the speed of the S3 uploads. Whatever we did (and it wasn’t just me, or my setup) we couldn’t match how fast knox worked (turning resumable on/off didn’t seem to make much difference to me, if any) even though the underlying GCS infrastructure could easily match S3 from the locations we tested using gsutil.

Given all this, and how knox managed to keep the ‘customisability and idiomatic-ness’ without sacrificing any speed, to me that seems a pretty big indicator there’re some significant inefficiencies in this client.

Turning off validation is a very temporary solution to fit the timescale I have to work with for my project, but I really think this should be fixed before any ‘Storage Stable’ milestone - there’s no way that a 10x performance hit is acceptable to anyone switching from S3 in my eyes.

Okay, so I won’t bother messing with that. I did try and upload up to 10 at a time in parallel and got better results. It uploaded in 46.42s with default settings. It also seems resumable: false greatly speeds things up on my machine, and frankly I don’t think resumable on a createWriteStream makes any sense so let’s get rid of that. If they want resumable, they will have to use upload.

I ran on my machine and didn’t see a huge spike in CPU (~20% max) but certainly was more time consuming to upload with default settings (validation and resumable is on). I’m running on S3 region us-east-1 and I’m located in Ottawa (mid-east Canada). Seems S3 is really sporadic with performance, as you can see by the fluctuation in numbers with no change in settings. GCS seems a little more stable in that regard. Here’s my numbers:

With default settings:

Run 1:

Finished uploading to s3 in 47.792 seconds Finished uploading to GCS in 91.161 seconds

Run 2:

Finished uploading to s3 in 41.566 seconds Finished uploading to GCS in 92.455 seconds

Run 3:

Finished uploading to s3 in 47.247 seconds Finished uploading to GCS in 93.156 seconds

Without validation or resumable GCS:

Run 1:

Finished uploading to s3 in 39.286 seconds Finished uploading to GCS in 54.476 seconds

Run 2:

Finished uploading to s3 in 57.732 seconds Finished uploading to GCS in 56.977 seconds

Run 3:

Finished uploading to s3 in 42.917 seconds Finished uploading to GCS in 52.861 seconds

I haven’t tried with gsutil yet or moving hashing to run at the end.

Should we consider turning validation to false by default?