aws-sdk-js: s3.getObject().createReadStream() fails when piped to a slow Writable stream

We have hit an issue where it appears that the stream exposed by createReadStream() fails without emitting an error before all data has been emitted to downstream listeners.

I have a repro case that I spent some time building.

A few caveats:

  1. You need a large source file. I have done this with a 1.8GB uncompressed file on S3. I have also added zlib to the mix and reproduced it with a 271MB compressed file. Same results
  2. You need to let it run awhile. With an uncompressed file I typically see the issue around 150MB parsed as long as the internet connection reading the file off S3 is fast. If slow it takes longer.
'use strict';

// Currently using 2.1.50
var AWS = require('aws-sdk');
var util = require('util');
var stream = require('stream');

var s3 = new AWS.S3({accessKeyId: '<AWSACCESSKEYID>', secretAccessKey: '<AWSSECRETACCESSKEY>', region: '<AWSREGION>'});

// Test bucket to get data from
var testBucket = "<bucket>";
// Large file
//   I have tested with 1.8GB Uncompressed
//   I have also tested with 271MB compressed by adding zlib.createGunzip() in
//      pipe() stream below
var testPath = "<path>";

var _chunkSize = 1000000;

function SlowStream(options) {
    // Keep track of how many bytes have passed through
    this.bytesParsed = 0;
    // Keep track of how many _chunkSize chunks we have seen
    this.chunkNumber = 0;
    options = options || {};
    options.objectMode = true;
    stream.Writable.call(this, options);

    // Output info when done about how many megabytes were parsed
    this.once('finish', function() {
        console.info('SlowStream finish');
        console.info('Megabytes Parsed: ' + this.bytesParsed / (1024 * 1024));
    });
}

util.inherits(SlowStream, stream.Writable);
SlowStream.prototype._write = function _write(row, enc, done) {
    // Track bytes
    this.bytesParsed += row.length;

    // Output bytes seen
    console.info('Mb Parsed: ' + this.bytesParsed / (1024 * 1024));

    // Every 1 MB pause for 5 seconds to simulate writing to a slow
    // target and create the need for back pressure
    if(Math.floor(this.bytesParsed / _chunkSize) >  this.chunkNumber) {
        this.chunkNumber = Math.floor(this.bytesParsed / _chunkSize);
        console.info('paused');
        setTimeout(done, 5000)  ;
    }
    else {
        setImmediate(done);
    }
};

// Get stream to read from S3
var readStream = s3.getObject({Bucket: testBucket, Key: testPath}).createReadStream();
// Create slow writable stream
var slowStream = new SlowStream();
// Indicate done
slowStream.once('finish', function() {
    console.info('Done');
    process.exit(0);
});

// Do the piping
readStream.pipe(slowStream);

Ultimately I need to be able to get to a place where this reads the whole file. Swapping out the S3 s3.getObject().createReadStream() for a filestream and everything works fine. Just doesn’t help given why we are using S3.

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Reactions: 4
  • Comments: 37 (6 by maintainers)

Most upvoted comments

@chrisradek Why close this, when people are still having this issue? There’s no resolution for this, and making multiple getObject requests in small parts is not workable, as it results in multiple streams which becomes unmanageable when piping to a process or another file.

Any update on this? @chrisradek ?

We’re seeing this on a regular basis, when downloading files from S3 to lambda. The files are ~100MB.

@tilfin Hilarious! I built something similar:

https://www.npmjs.com/package/s3-stream-download

I was waiting to attach it to this thread until we had been using it for a week or so. It has fixed the issue for us. I will check out yours also…

It seems that it was my fault. I replaced the code

reader.on("end", () => {
        console.log(`Stored as: '${outFileName}'`);
        callback();
    });

with

writer.on("finish", () => {
        console.log(`Stored as: '${outFileName}'`);
        callback();
    });

and my script processed 5120/46536 objects without any errors

RESOLVED

@chriskinsman I made ​​a module recently might be helpful for you. https://github.com/tilfin/s3-block-read-stream