google-cloud-node: Slow Storage uploads and high CPU usage
I’m seeing very slow uploads when sending large numbers of medium-sized files to storage - in the order of it taking 10x the time compared to uploading to S3. I also see excessive CPU usage during this time.
This Gist is a repro in the form of a side-by-side comparison uploading the same set of 750 medium-sized random text files to S3 using knox and to GCS using gcloud-node
gsutil
can upload the same number of files just as quickly as anything to S3, so I’m pretty sure it’s not the service itself.
I’ve tried limiting the request
module’s default maxSockets to 5 or 10 for the request pool but that didn’t seem to help. I have a hunch it’s the sheer number of outstanding requests or streams that’s causing node to spin its wheels and that maybe some form of global queue could fix it, but I haven’t been able to validate that yet.
Any help getting this into a usable state would be much appreciated
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 22 (18 by maintainers)
Commits related to this issue
- build(test): recursively find test files; fail on unsupported dependency versions (#397) Source-Author: Megan Potter <57276408+feywind@users.noreply.github.com> Source-Date: Fri Sep 11 18:47:00 2020... — committed to googleapis/google-cloud-node by yoshi-automation 4 years ago
- fix: do not modify options object, use defaultScopes (#397) Regenerated the library using [gapic-generator-typescript](https://github.com/googleapis/gapic-generator-typescript) v1.2.1. — committed to googleapis/google-cloud-node by alexander-fenster 4 years ago
- fix: do not modify options object, use defaultScopes (#397) Regenerated the library using [gapic-generator-typescript](https://github.com/googleapis/gapic-generator-typescript) v1.2.1. — committed to googleapis/google-cloud-node by alexander-fenster 4 years ago
- fix: do not modify options object, use defaultScopes (#397) Regenerated the library using [gapic-generator-typescript](https://github.com/googleapis/gapic-generator-typescript) v1.2.1. — committed to googleapis/google-cloud-node by alexander-fenster 4 years ago
- fix: do not modify options object, use defaultScopes (#397) Regenerated the library using [gapic-generator-typescript](https://github.com/googleapis/gapic-generator-typescript) v1.2.1. — committed to googleapis/google-cloud-node by alexander-fenster 4 years ago
- docs: document version support goals (#397) — committed to googleapis/google-cloud-node by bcoe 4 years ago
- build: track flaky tests for "nightly", add new secrets for tagging (#397) This PR was generated using Autosynth. :rainbow: Synth log will be available here: https://source.cloud.google.com/results/... — committed to googleapis/google-cloud-node by yoshi-automation 4 years ago
- build(nodejs): correct artifact name for npm (#397) * build(nodejs): correct artifact name for npm PiperOrigin-RevId: 396640130 Source-Link: https://github.com/googleapis/googleapis/commit/c532... — committed to googleapis/google-cloud-node by gcf-owl-bot[bot] 3 years ago
- chore(deps): update dependency linkinator to v4 (#397) [![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package |... — committed to googleapis/google-cloud-node by renovate-bot 2 years ago
- docs: document version support goals (#397) — committed to googleapis/google-cloud-node by bcoe 4 years ago
- build: track flaky tests for "nightly", add new secrets for tagging (#397) This PR was generated using Autosynth. :rainbow: Synth log will be available here: https://source.cloud.google.com/results/... — committed to googleapis/google-cloud-node by yoshi-automation 4 years ago
- chore: update npm scripts and synth.py (#397) Update npm scripts: add clean, prelint, prefix; make sure that lint and fix are set properly. Use post-process feature of synthtool. — committed to googleapis/google-cloud-node by alexander-fenster 4 years ago
- build: use bazel build (#397) — committed to googleapis/google-cloud-node by alexander-fenster 4 years ago
- build: add type for ClusterManagerClient (#397) — committed to googleapis/google-cloud-node by yoshi-automation 4 years ago
- fix(deps): google-gax v2.24.1 (#397) — committed to googleapis/google-cloud-node by bcoe 3 years ago
- chore: release 2.2.6 (#398) :robot: I have created a release \*beep\* \*boop\* --- ### [2.2.6](https://www.github.com/googleapis/nodejs-scheduler/compare/v2.2.5...v2.2.6) (2021-08-17) ### Bug Fixes... — committed to googleapis/google-cloud-node by release-please[bot] 3 years ago
- chore(main): release 4.1.1 (#397) * chore(main): release 4.1.1 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.m... — committed to googleapis/google-cloud-node by release-please[bot] 2 years ago
- build(test): recursively find test files; fail on unsupported dependency versions (#397) Source-Author: Megan Potter <57276408+feywind@users.noreply.github.com> Source-Date: Fri Sep 11 18:47:00 2020... — committed to googleapis/google-cloud-node by yoshi-automation 4 years ago
- chore: update v2.12.0 gapic-generator-typescript (#397) - [ ] Regenerate this pull request now. Committer: @summer-ji-eng PiperOrigin-RevId: 424244721 Source-Link: https://github.com/googleapis/goo... — committed to googleapis/google-cloud-node by gcf-owl-bot[bot] 2 years ago
- build(node): add KOKORO_BUILD_ARTIFACTS_SUBDIR to env (#397) This PR was generated using Autosynth. :rainbow: Synth log will be available here: https://source.cloud.google.com/results/invocations/32... — committed to googleapis/google-cloud-node by yoshi-automation 4 years ago
I’m getting consistent 60 seconds for uploads when using
googleapis
library (10 uploads max parallel). I’m seeing slightly better performance (between 55 and 60s uploads) from thegcloud-node
library surprisingly but only when resumable is false. If resumable is true, it’s super super slow and actually is making my fan start on my computer which never happens. CPU is at 99%. The entire upload took a whopping 487 seconds… 😦 Knox ran in 42-46 seconds.I may not have a fix or a way to drastically improve performance, but that doesn’t mean I think this is fixed. Turning off validation may have gotten some significant performance gains; it was 5x faster on my setup without it, however, that’s still half the speed of the S3 uploads. Whatever we did (and it wasn’t just me, or my setup) we couldn’t match how fast knox worked (turning resumable on/off didn’t seem to make much difference to me, if any) even though the underlying GCS infrastructure could easily match S3 from the locations we tested using gsutil.
Given all this, and how knox managed to keep the ‘customisability and idiomatic-ness’ without sacrificing any speed, to me that seems a pretty big indicator there’re some significant inefficiencies in this client.
Turning off validation is a very temporary solution to fit the timescale I have to work with for my project, but I really think this should be fixed before any ‘Storage Stable’ milestone - there’s no way that a 10x performance hit is acceptable to anyone switching from S3 in my eyes.
Okay, so I won’t bother messing with that. I did try and upload up to 10 at a time in parallel and got better results. It uploaded in 46.42s with default settings. It also seems
resumable: false
greatly speeds things up on my machine, and frankly I don’t think resumable on a createWriteStream makes any sense so let’s get rid of that. If they want resumable, they will have to useupload
.I ran on my machine and didn’t see a huge spike in CPU (~20% max) but certainly was more time consuming to upload with default settings (validation and resumable is on). I’m running on S3 region us-east-1 and I’m located in Ottawa (mid-east Canada). Seems S3 is really sporadic with performance, as you can see by the fluctuation in numbers with no change in settings. GCS seems a little more stable in that regard. Here’s my numbers:
With default settings:
Run 1:
Finished uploading to s3 in 47.792 seconds Finished uploading to GCS in 91.161 seconds
Run 2:
Finished uploading to s3 in 41.566 seconds Finished uploading to GCS in 92.455 seconds
Run 3:
Finished uploading to s3 in 47.247 seconds Finished uploading to GCS in 93.156 seconds
Without validation or resumable GCS:
Run 1:
Finished uploading to s3 in 39.286 seconds Finished uploading to GCS in 54.476 seconds
Run 2:
Finished uploading to s3 in 57.732 seconds Finished uploading to GCS in 56.977 seconds
Run 3:
Finished uploading to s3 in 42.917 seconds Finished uploading to GCS in 52.861 seconds
I haven’t tried with
gsutil
yet or moving hashing to run at the end.Should we consider turning validation to false by default?