nodejs-storage: Storage: lots of "socket hang up" errors

From @ovaris on September 21, 2017 8:31

Environment details

  • OS:
  • Node.js version: 8.5.0
  • npm version: 5.3.0
  • google-cloud-node/storage version: 1.2.1

I have a utlity nodejs script that checks existence of few thousands of files in cloud storage. I run script locally, so not in Cloud environment. I’m executing those checks (bucket.file(fileName).exists()) in batch of 20, so not all checks are fired concurrently. I’m seeing lots of these errors when trying to run script:

{ Error: socket hang up
    at TLSSocket.onHangUp (_tls_wrap.js:1140:19)
    at Object.onceWrapper (events.js:314:30)
    at emitNone (events.js:110:20)
    at TLSSocket.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1059:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
  code: 'ECONNRESET',
  path: null,
  host: 'accounts.google.com',
  port: 443,
  localAddress: undefined }

and these:

{ Error: read ECONNRESET
    at _errnoException (util.js:1026:11)
    at TLSWrap.onread (net.js:606:25) code: 'ECONNRESET', errno: 'ECONNRESET', syscall: 'read' }

aaand these:

{ Error: socket hang up
    at createHangUpError (_http_client.js:345:15)
    at TLSSocket.socketOnEnd (_http_client.js:437:23)
    at emitNone (events.js:110:20)
    at TLSSocket.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1059:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9) code: 'ECONNRESET' }

I have added this fix (suggested here: https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2254):

const gcs = storage();
//https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2254
gcs.interceptors.push({
    request: function(reqOpts) {
        reqOpts.forever = false;
        return reqOpts
    }
});

I have tried to reduce the check batch size, but it didn’t have any effect.

Copied from original issue: GoogleCloudPlatform/google-cloud-node#2623

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 6
  • Comments: 42 (18 by maintainers)

Commits related to this issue

Most upvoted comments

still getting this issue with the last version. Any workaround?

Just updating anyone else still waiting: v2.3.3 still suffers from the FetchError: network timeout bug that I mentioned above. v2.1.0 is the latest version that even has a chance of working. v2.1.0 fails for me about 30% of the time, but the later versions fail 100% of the time.

I upgraded to @google-cloud/storage v2.2.0 and while I no longer see the ECONNRESET issue, I now get the following when writing files to Cloud Storage from Cloud Functions:

FetchError: network timeout at: https://www.googleapis.com/upload/storage/v1/b/my-bucket-name/o?uploadType=multipart&name=lookup_tmp%2Fdatafiles%2F20181031%2Fhit_data.tsv

It seems to happen 100% of the time, whereas the old ECONNRESET error was probably 50%. The files I’m writing are large’ish, around 2GB. I am reading a compressed .tar.gz file and writing out the individual entries to Cloud Storage.

Is there some way to change the timeout settings? Make it wait longer before timing out? Any other ideas? I’m glad that I can now see (I think?) the actual error, instead of the inscrutable ECONNRESET, but I’m not sure how to deal with the fact that it occurs 100% of the time, making my previous “retry until it finally works” strategy worthless.

This is a stupid user error and can be closed. Of course code above will fire ALL requests at the same time

@kinwa91 @stephenplusplus this one has been going on for a while now, and I’m concerned about the whole downgrading to 2.1 thing. Can y’all prioritize an investigation for this tomorrow?

@micahwedemeyer @stephenplusplus yeah, I’m still seeing two issues on master (3e5a196) with all the latest dependencies:

  1. Streamed uploads are still being retried. These were supposed to be disabled by the earlier fix. See more info below.

  2. node-fetch’s default 60 second timeout seems to mean that the entire request and response must be completed within 60 seconds. See https://github.com/bitinn/node-fetch/issues/446. I think this is a regression introduced by c2c1382a2d11d271c5ef8b58c263d72db88ca4d8 in nodejs-storage@2.2.0.

Repro:

const {Storage} = require("."); // nodejs-storage
const client = new Storage({projectId: "myproject"});
const {Readable} = require("stream");
const ws = client.bucket("zb-dev").file("test").createWriteStream({resumable: false});
const body = new Readable({
	read() {
		console.log(new Date(), "read request");
		setTimeout(() => { console.log(new Date(), "pushing"); this.push("info"); }, 15000);
	}
});
ws.on("error", console.error);
body.pipe(ws);

And add logging statements to node_modules/node-fetch/lib/index.js where the timeout is set and cleared (around line 1336):

2018-11-02T02:08:44.817Z 'read request'
2018-11-02T02:08:59.821Z 'pushing'
2018-11-02T02:09:00.014Z 'Setting timeout for' 60000
2018-11-02T02:10:00.015Z 'timeout' // attempt #1 timed out
2018-11-02T02:10:00.815Z 'Setting timeout for' 60000 // retrying
...
2018-11-02T02:11:00.818Z 'timeout' // attempt #2 timed out
2018-11-02T02:11:01.600Z 'Setting timeout for' 60000 // retrying
...
2018-11-02T02:12:01.602Z 'timeout' // attempt #3 timed out
 FetchError: network timeout // 3 strikes

We experience the issue with GET requests and the PR doesn’t seem to address this case.

I believe a fix has been found, (thanks, @zbjornson!), and a PR has been sent here: https://github.com/googleapis/nodejs-common/pull/268

Here is the code producing the error:

const readline = require('readline');

const storage = new Storage({projectId: config.projectId});
const bucket = storage.bucket(config.bucketName);
const remoteFile = bucket.file('events.txt');

const lineReader = readline.createInterface(remoteFile.createReadStream());

lineReader.on('line', /* do stuff */);

The code runs on GCE:

  • OS: Linux 4.9.0-8-amd64 SMP Debian 4.9.110-3+deb9u3 (2018-08-19) x86_64 GNU/Linux
  • node: v8.11.4

It reliably times out after 60 seconds of streaming, only a single stream is opened at the same time.

Let me know if you need any further information.

I ran another test today with 7 files and they all successfully streamed to GCS. Great work.

Tomorrow came earlier than expected-- v2.4.2 is out now! Please update and report back any lingering issues.

@avishnyak, are you running this in the cloud somewhere or locally? Also, any more details you think would make a difference, please share.

Thanks!

We are running in a k8s environment on GCP. We are getting the same issue from all pods and across different language stacks (Node and Ruby).