google-cloud-ruby: Google::Cloud::Storage is about twice slower than gsutil cp for downloading files

Hi, I ran the following benchmark from a GKE container:

#!/usr/bin/env ruby

require 'benchmark'

def measure(&block)
  duration = Benchmark.realtime(&block)
  puts "took: #{duration.round(1)}s"
end

puts "--- Generate 500MB file"
puts system('dd if=/dev/urandom of=/tmp/random-500M.bin bs=1048576 count=500')

url = "gs://<my-test-bucket>/test/random-500M.bin.#{rand}"
puts "--- Upload with gsutil"
measure do
  system("gsutil cp /tmp/random-500M.bin #{url}")
end


puts "--- Download with gsutil"
measure do
  system("gsutil cp #{url} /tmp/random-500M.bin.#{rand}")
end

puts "--- Upload with GCS-ruby"

bucket = Google::Cloud::Storage.new(
  project: '<my-test-project>',
  keyfile: ENV['GOOGLE_APPLICATION_CREDENTIALS'],
  timeout: 5,
  retries: 0,
).bucket('<my-test-bucket>', skip_lookup: true)

bucket = Buildkite::FSCache.send(:bucket)

file_name = "test/random-500M.bin.#{rand}"
measure do
  puts bucket.create_file('/tmp/random-500M.bin', file_name)
end

puts "--- Download with GCS-ruby"
measure do
  puts bucket.file(file_name).download("/tmp/random-500M.bin.#{rand}")
end

In short, upload an then download a random 500MB file, with the following results:

  • gsutil cp upload 7.1s
  • gsutil cp download 6.1s
  • G::C::S upload 4.8s
  • G::C::S download 13.9s

There obviously a bit of variance between the runs, but G::C::S#download is constantly twice slower than the rest.

Also note that gsutil doesn’t use crcmod here (it’s printing warnings about it).

I tired looking at gsutil’s source to see what it’s doing differently, but it’s a bit like looking for a needle in a haystack.

Any ideas?

cc @DazWorrall @wvanbergen

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 29 (27 by maintainers)

Most upvoted comments

[context: I work with @casperisfine]

You can ignore that line I think @quartzmo , it looks like some leftover cruft as a result of extracting this code from the project we’re using it in. The result is the same as the code immediately above, it returns a Google::Cloud::Storage::Bucket object.