nodejs-storage: Storage: lots of "socket hang up" errors

From @ovaris on September 21, 2017 8:31

Environment details

OS:
Node.js version: 8.5.0
npm version: 5.3.0
google-cloud-node/storage version: 1.2.1

I have a utlity nodejs script that checks existence of few thousands of files in cloud storage. I run script locally, so not in Cloud environment. I’m executing those checks (bucket.file(fileName).exists()) in batch of 20, so not all checks are fired concurrently. I’m seeing lots of these errors when trying to run script:

{ Error: socket hang up
    at TLSSocket.onHangUp (_tls_wrap.js:1140:19)
    at Object.onceWrapper (events.js:314:30)
    at emitNone (events.js:110:20)
    at TLSSocket.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1059:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
  code: 'ECONNRESET',
  path: null,
  host: 'accounts.google.com',
  port: 443,
  localAddress: undefined }

and these:

{ Error: read ECONNRESET
    at _errnoException (util.js:1026:11)
    at TLSWrap.onread (net.js:606:25) code: 'ECONNRESET', errno: 'ECONNRESET', syscall: 'read' }

aaand these:

{ Error: socket hang up
    at createHangUpError (_http_client.js:345:15)
    at TLSSocket.socketOnEnd (_http_client.js:437:23)
    at emitNone (events.js:110:20)
    at TLSSocket.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1059:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9) code: 'ECONNRESET' }

I have added this fix (suggested here: https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2254):

const gcs = storage();
//https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2254
gcs.interceptors.push({
    request: function(reqOpts) {
        reqOpts.forever = false;
        return reqOpts
    }
});

I have tried to reduce the check batch size, but it didn’t have any effect.

Copied from original issue: GoogleCloudPlatform/google-cloud-node#2623

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 6
Comments: 42 (18 by maintainers)

Commits related to this issue

fix: remove timeout rule from streaming uploads (#365) Fixes https://github.com/googleapis/nodejs-storage/issues/27 — committed to googleapis/nodejs-common by stephenplusplus 5 years ago
deps: update @google-cloud/common (#596) Fixes #27 This updates @google-cloud/common, which includes the `timeout: 0` fix for streaming file uploads. — committed to googleapis/nodejs-storage by stephenplusplus 5 years ago
fix: remove timeout rule from streaming uploads (#365) Fixes https://github.com/googleapis/nodejs-storage/issues/27 — committed to robertomartinez09515/nodejs-common-create by robertomartinez09515 5 years ago

Most upvoted comments

still getting this issue with the last version. Any workaround?

+23

jeremymarc on Mar 20, 2018

Just updating anyone else still waiting: v2.3.3 still suffers from the FetchError: network timeout bug that I mentioned above. v2.1.0 is the latest version that even has a chance of working. v2.1.0 fails for me about 30% of the time, but the later versions fail 100% of the time.

micahwedemeyer on Dec 18, 2018

I upgraded to @google-cloud/storage v2.2.0 and while I no longer see the ECONNRESET issue, I now get the following when writing files to Cloud Storage from Cloud Functions:

FetchError: network timeout at: https://www.googleapis.com/upload/storage/v1/b/my-bucket-name/o?uploadType=multipart&name=lookup_tmp%2Fdatafiles%2F20181031%2Fhit_data.tsv

It seems to happen 100% of the time, whereas the old ECONNRESET error was probably 50%. The files I’m writing are large’ish, around 2GB. I am reading a compressed .tar.gz file and writing out the individual entries to Cloud Storage.

Is there some way to change the timeout settings? Make it wait longer before timing out? Any other ideas? I’m glad that I can now see (I think?) the actual error, instead of the inscrutable ECONNRESET, but I’m not sure how to deal with the fact that it occurs 100% of the time, making my previous “retry until it finally works” strategy worthless.

micahwedemeyer on Nov 1, 2018

This is a stupid user error and can be closed. Of course code above will fire ALL requests at the same time

ovaris on Sep 29, 2017

@kinwa91 @stephenplusplus this one has been going on for a while now, and I’m concerned about the whole downgrading to 2.1 thing. Can y’all prioritize an investigation for this tomorrow?

JustinBeckwith on Feb 3, 2019

@micahwedemeyer @stephenplusplus yeah, I’m still seeing two issues on master (3e5a196) with all the latest dependencies:

Streamed uploads are still being retried. These were supposed to be disabled by the earlier fix. See more info below.
node-fetch’s default 60 second timeout seems to mean that the entire request and response must be completed within 60 seconds. See https://github.com/bitinn/node-fetch/issues/446. I think this is a regression introduced by c2c1382a2d11d271c5ef8b58c263d72db88ca4d8 in nodejs-storage@2.2.0.

Repro:

const {Storage} = require("."); // nodejs-storage
const client = new Storage({projectId: "myproject"});
const {Readable} = require("stream");
const ws = client.bucket("zb-dev").file("test").createWriteStream({resumable: false});
const body = new Readable({
	read() {
		console.log(new Date(), "read request");
		setTimeout(() => { console.log(new Date(), "pushing"); this.push("info"); }, 15000);
	}
});
ws.on("error", console.error);
body.pipe(ws);

And add logging statements to node_modules/node-fetch/lib/index.js where the timeout is set and cleared (around line 1336):

2018-11-02T02:08:44.817Z 'read request'
2018-11-02T02:08:59.821Z 'pushing'
2018-11-02T02:09:00.014Z 'Setting timeout for' 60000
2018-11-02T02:10:00.015Z 'timeout' // attempt #1 timed out
2018-11-02T02:10:00.815Z 'Setting timeout for' 60000 // retrying
...
2018-11-02T02:11:00.818Z 'timeout' // attempt #2 timed out
2018-11-02T02:11:01.600Z 'Setting timeout for' 60000 // retrying
...
2018-11-02T02:12:01.602Z 'timeout' // attempt #3 timed out
 FetchError: network timeout // 3 strikes

zbjornson on Nov 2, 2018

We experience the issue with GET requests and the PR doesn’t seem to address this case.

avishnyak on Oct 15, 2018

I believe a fix has been found, (thanks, @zbjornson!), and a PR has been sent here: https://github.com/googleapis/nodejs-common/pull/268

stephenplusplus on Oct 15, 2018

Here is the code producing the error:

const readline = require('readline');

const storage = new Storage({projectId: config.projectId});
const bucket = storage.bucket(config.bucketName);
const remoteFile = bucket.file('events.txt');

const lineReader = readline.createInterface(remoteFile.createReadStream());

lineReader.on('line', /* do stuff */);

The code runs on GCE:

OS: Linux 4.9.0-8-amd64 SMP Debian 4.9.110-3+deb9u3 (2018-08-19) x86_64 GNU/Linux
node: v8.11.4

It reliably times out after 60 seconds of streaming, only a single stream is opened at the same time.

Let me know if you need any further information.

Scarysize on Sep 12, 2018

I ran another test today with 7 files and they all successfully streamed to GCS. Great work.

micahwedemeyer on Feb 7, 2019

Tomorrow came earlier than expected-- v2.4.2 is out now! Please update and report back any lingering issues.

stephenplusplus on Feb 6, 2019

@avishnyak, are you running this in the cloud somewhere or locally? Also, any more details you think would make a difference, please share.

Thanks!

We are running in a k8s environment on GCP. We are getting the same issue from all pods and across different language stacks (Node and Ruby).

avishnyak on Jan 10, 2019