google-cloud-ruby: Broken pipe when uploading files (cloud storage)

We use GCS to store backups and for the last two days multiple servers have thrown a ‘Broken pipe’ exception when performing their backups.

I’ve tried uploading a similar size file on my machine locally but it throws the same exception so I don’t think it’s a network related issue.

Strangely, I can upload a 100MB file but trying to upload a 1.29GB file or larger causes the broken pipe.

Here’s a test script I’m using locally:

#!/usr/bin/ruby

require 'logger'
require 'google/cloud/storage'

logger = Logger.new($stderr)
logger.level = Logger::DEBUG

Google::Apis.logger = logger

storage = Google::Cloud::Storage.new(
	timeout: 600,
	project: 'nialto-services',
	keyfile: File.expand_path('~/gcs/nialto-services-a6dbccf5fe51.json')
)

bucket = storage.bucket('nialto-testing')

local_path  = File.expand_path('~/gcs/20181010080000_example.com.1.dar')
remote_path = File.join('files', 'example.com', File.basename(local_path))

p bucket.create_file(local_path, remote_path)

And here’s the output of uploading a 1.29GB file:

D, [2018-10-11T11:45:46.610028 #8992] DEBUG -- : Sending HTTP get https://www.googleapis.com/storage/v1/b/bucket-name?
D, [2018-10-11T11:45:46.984530 #8992] DEBUG -- : 200
D, [2018-10-11T11:45:46.984900 #8992] DEBUG -- : #<HTTP::Message:0x00007fb8122e0190 @http_header=#<HTTP::Message::Headers:0x00007fb8122e0168 @http_version="1.1", @body_size=0, @chunked=false, @request_method="GET", @request_uri=#<Addressable::URI:0x3fdc09170ce4 URI:https://www.googleapis.com/storage/v1/b/bucket-name?>, @request_query=nil, @request_absolute_uri=nil, @status_code=200, @reason_phrase="OK", @body_type=nil, @body_charset=nil, @body_date=nil, @body_encoding=#<Encoding:UTF-8>, @is_request=false, @header_item=[["X-GUploader-UploadID", "xxxx"], ["ETag", "CAI="], ["Vary", "Origin"], ["Vary", "X-Origin"], ["Content-Type", "application/json; charset=UTF-8"], ["Expires", "Thu, 11 Oct 2018 10:45:46 GMT"], ["Date", "Thu, 11 Oct 2018 10:45:46 GMT"], ["Cache-Control", "private, max-age=0, must-revalidate, no-transform"], ["Content-Length", "370"], ["Server", "UploadServer"], ["Alt-Svc", "quic=\":443\"; ma=2592000; v=\"44,43,39,35\""]], @dumped=false>, @peer_cert=#<OpenSSL::X509::Certificate: subject=#<OpenSSL::X509::Name CN=*.googleapis.com,O=Google LLC,L=Mountain View,ST=California,C=US>, issuer=#<OpenSSL::X509::Name CN=Google Internet Authority G3,O=Google Trust Services,C=US>, serial=#<OpenSSL::BN:0x00007fb810a92e30>, not_before=2018-09-18 12:34:00 UTC, not_after=2018-12-11 12:34:00 UTC>, @http_body=#<HTTP::Message::Body:0x00007fb81233fcd0 @body="{\n \"kind\": \"storage#bucket\",\n \"id\": \"bucket-name\",\n \"selfLink\": \"https://www.googleapis.com/storage/v1/b/bucket-name\",\n \"projectNumber\": \"000000000000\",\n \"name\": \"bucket-name\",\n \"timeCreated\": \"2018-10-11T10:01:50.330Z\",\n \"updated\": \"2018-10-11T10:05:04.537Z\",\n \"metageneration\": \"2\",\n \"location\": \"EUROPE-WEST1\",\n \"storageClass\": \"REGIONAL\",\n \"etag\": \"CAI=\"\n}\n", @size=0, @positions=nil, @chunk_size=nil>, @previous=nil>
D, [2018-10-11T11:45:46.986936 #8992] DEBUG -- : Success - #<Google::Apis::StorageV1::Bucket:0x00007fb810a926b0
 @etag="CAI=",
 @id="bucket-name",
 @kind="storage#bucket",
 @location="EUROPE-WEST1",
 @metageneration=2,
 @name="bucket-name",
 @project_number=000000000000,
 @self_link="https://www.googleapis.com/storage/v1/b/bucket-name",
 @storage_class="REGIONAL",
 @time_created=
  #<DateTime: 2018-10-11T10:01:50+00:00 ((2458403j,36110s,330000000n),+0s,2299161j)>,
 @updated=
  #<DateTime: 2018-10-11T10:05:04+00:00 ((2458403j,36304s,537000000n),+0s,2299161j)>>

D, [2018-10-11T11:45:47.006798 #8992] DEBUG -- : Sending upload start command to https://www.googleapis.com/upload/storage/v1/b/bucket-name/o?name=files%2Fexample.com%2F20181010080000_example.com.1.dar
D, [2018-10-11T11:45:47.089064 #8992] DEBUG -- : Upload status active
D, [2018-10-11T11:45:47.089145 #8992] DEBUG -- : Sending upload command to https://www.googleapis.com/upload/storage/v1/b/bucket-name/o?name=files%2Fexample.com%2F20181010080000_example.com.1.dar&upload_id=xxxx&upload_protocol=resumable
D, [2018-10-11T11:45:52.167122 #8992] DEBUG -- : Error - #<HTTPClient::KeepAliveDisconnected: HTTPClient::KeepAliveDisconnected: Broken pipe>

Traceback (most recent call last):
	28: from Untitled.rb:22:in `<main>'
	27: from /usr/local/lib/ruby/gems/2.5.0/gems/google-cloud-storage-1.15.0/lib/google/cloud/storage/bucket.rb:1142:in `create_file'
	26: from /usr/local/lib/ruby/gems/2.5.0/gems/google-cloud-storage-1.15.0/lib/google/cloud/storage/service.rb:309:in `insert_file'
	25: from /usr/local/lib/ruby/gems/2.5.0/gems/google-cloud-storage-1.15.0/lib/google/cloud/storage/service.rb:568:in `execute'
	24: from /usr/local/lib/ruby/gems/2.5.0/gems/google-cloud-storage-1.15.0/lib/google/cloud/storage/service.rb:310:in `block in insert_file'
	23: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/generated/google/apis/storage_v1/service.rb:1898:in `insert_object'
	22: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/base_service.rb:360:in `execute_or_queue_command'
	21: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/http_command.rb:93:in `execute'
	20: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:56:in `retriable'
	19: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:56:in `times'
	18: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:61:in `block in retriable'
	17: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/http_command.rb:101:in `block in execute'
	16: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:56:in `retriable'
	15: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:56:in `times'
	14: from /usr/local/lib/ruby/gems/2.5.0/gems/retriable-3.1.2/lib/retriable.rb:61:in `block in retriable'
	13: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/http_command.rb:104:in `block (2 levels) in execute'
	12: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/upload.rb:254:in `execute_once'
	11: from /usr/local/lib/ruby/gems/2.5.0/gems/google-api-client-0.24.3/lib/google/apis/core/upload.rb:228:in `send_upload_command'
	10: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:765:in `post'
	 9: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:854:in `request'
	 8: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1104:in `follow_redirect'
	 7: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1014:in `do_request'
	 6: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1131:in `protect_keep_alive_disconnected'
	 5: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1138:in `rescue in protect_keep_alive_disconnected'
	 4: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1019:in `block in do_request'
	 3: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient.rb:1242:in `do_get_block'
	 2: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:177:in `query'
	 1: from /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:514:in `query'
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/session.rb:524:in `rescue in query': HTTPClient::KeepAliveDisconnected: Broken pipe (HTTPClient::KeepAliveDisconnected)

Any ideas what’s happening here?

Thanks 😀

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

We heard from the GCS team last night. They identified a possible culprit and deployed a fix. My repro script that was raising is now working as well.

We also received confirmation that large uploads of a known size should be sent all at once, and not in chunks. So the current behavior is correct.

It looks like this has been fixed so I will close this issue now. A huge thanks to everyone who participated. Let us know if you have any questions about this.

Here is my repro script using google-api-client instead of google-cloud-storage:

#!/usr/bin/ruby

def print_error e
  p "!"*42
  p e
  p e.instance_variables.map { |attribute| { attribute => e.instance_variable_get(attribute) } }.inspect
  p "!"*42

  print_error e.cause unless e.cause.nil?
end

require "logger"

logger = Logger.new($stderr)
logger.level = Logger::DEBUG

require "google/apis"

Google::Apis.logger = logger

require "googleauth"

project = ENV["GOOGLE_CLOUD_PROJECT"] || "<project-id>"
keyfile = ENV["GOOGLE_CLOUD_KEYFILE"] || "<keyfile-json-path>"

scope = "https://www.googleapis.com/auth/devstorage.full_control"

authorizer = Google::Auth::ServiceAccountCredentials.make_creds(
  json_key_io: File.open(keyfile),
  scope: scope
)
authorizer.fetch_access_token!

require "google/apis/storage_v1"

storage = Google::Apis::StorageV1::StorageService.new
storage.client_options.open_timeout_sec = 600
storage.client_options.read_timeout_sec = 600
storage.client_options.send_timeout_sec = 600
storage.authorization = authorizer

bucket_name = "<bucket-name>"

# local_path  = File.expand_path "~/Downloads/alpine-standard-3.8.1-x86_64.iso"
local_path  = File.expand_path "~/Downloads/ubuntu-18.04.1-desktop-amd64.iso"
remote_path = File.join "files", "example.org", File.basename(local_path)

begin
  uploaded_file = storage.insert_object bucket_name, upload_source: local_path, name: remote_path
  p uploaded_file
rescue => e
  print_error e
end

Apologies for being so quiet, I’ve been actively looking at this and don’t have much to share just yet. I have been able to reproduce this, however. Last time I actively worked on uploads we successfully uploaded large files, 5GB+, so this does seem to be a recent change in behavior. I don’t know if this is intentional or not.

The upload implementation has changed a couple of times previously, and we are now wholly reliant on google-api-client for performing uploads. That means my understanding of how uploads are to be managed is out of date. Here is what I can say so far: I am seeing GCS return the X-Goog-Upload-Chunk-Granularity header now, and I don’t remember it doing that before. It is possible its been doing that for a long time however, as I’ve not been working on uploads for a while. I don’t know what this header is supposed to indicate to the client, if anything at all, but I think it means that the client should chunk the upload. I don’t see chunking implemented for Resumable Uploads in the google-api-client code. I’ve got a spike that chunks uploads by calculating the multiple of the provided chunk granularity closest to 1GB. I’ve been able to upload a 1.9GB file using those changes. So that seems to be a viable option to fix this, depending on what we hear from folks at Google.

Thanks for your patience. Hopefully we will get this resolved soon.