google-cloud-ruby: Google::Cloud::Storage is about twice slower than gsutil cp for downloading files
Hi, I ran the following benchmark from a GKE container:
#!/usr/bin/env ruby
require 'benchmark'
def measure(&block)
duration = Benchmark.realtime(&block)
puts "took: #{duration.round(1)}s"
end
puts "--- Generate 500MB file"
puts system('dd if=/dev/urandom of=/tmp/random-500M.bin bs=1048576 count=500')
url = "gs://<my-test-bucket>/test/random-500M.bin.#{rand}"
puts "--- Upload with gsutil"
measure do
system("gsutil cp /tmp/random-500M.bin #{url}")
end
puts "--- Download with gsutil"
measure do
system("gsutil cp #{url} /tmp/random-500M.bin.#{rand}")
end
puts "--- Upload with GCS-ruby"
bucket = Google::Cloud::Storage.new(
project: '<my-test-project>',
keyfile: ENV['GOOGLE_APPLICATION_CREDENTIALS'],
timeout: 5,
retries: 0,
).bucket('<my-test-bucket>', skip_lookup: true)
bucket = Buildkite::FSCache.send(:bucket)
file_name = "test/random-500M.bin.#{rand}"
measure do
puts bucket.create_file('/tmp/random-500M.bin', file_name)
end
puts "--- Download with GCS-ruby"
measure do
puts bucket.file(file_name).download("/tmp/random-500M.bin.#{rand}")
end
In short, upload an then download a random 500MB file, with the following results:
- gsutil cp upload
7.1s
- gsutil cp download
6.1s
- G::C::S upload
4.8s
- G::C::S download
13.9s
There obviously a bit of variance between the runs, but G::C::S#download
is constantly twice slower than the rest.
Also note that gsutil
doesn’t use crcmod
here (it’s printing warnings about it).
I tired looking at gsutil
’s source to see what it’s doing differently, but it’s a bit like looking for a needle in a haystack.
Any ideas?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 29 (27 by maintainers)
I submitted a PR to httpclient: https://github.com/nahi/httpclient/pull/383
[context: I work with @casperisfine]
You can ignore that line I think @quartzmo , it looks like some leftover cruft as a result of extracting this code from the project we’re using it in. The result is the same as the code immediately above, it returns a
Google::Cloud::Storage::Bucket
object.