aws-sdk-js: s3.getObject().createReadStream() fails when piped to a slow Writable stream
We have hit an issue where it appears that the stream exposed by createReadStream() fails without emitting an error before all data has been emitted to downstream listeners.
I have a repro case that I spent some time building.
A few caveats:
- You need a large source file. I have done this with a 1.8GB uncompressed file on S3. I have also added zlib to the mix and reproduced it with a 271MB compressed file. Same results
- You need to let it run awhile. With an uncompressed file I typically see the issue around 150MB parsed as long as the internet connection reading the file off S3 is fast. If slow it takes longer.
'use strict';
// Currently using 2.1.50
var AWS = require('aws-sdk');
var util = require('util');
var stream = require('stream');
var s3 = new AWS.S3({accessKeyId: '<AWSACCESSKEYID>', secretAccessKey: '<AWSSECRETACCESSKEY>', region: '<AWSREGION>'});
// Test bucket to get data from
var testBucket = "<bucket>";
// Large file
// I have tested with 1.8GB Uncompressed
// I have also tested with 271MB compressed by adding zlib.createGunzip() in
// pipe() stream below
var testPath = "<path>";
var _chunkSize = 1000000;
function SlowStream(options) {
// Keep track of how many bytes have passed through
this.bytesParsed = 0;
// Keep track of how many _chunkSize chunks we have seen
this.chunkNumber = 0;
options = options || {};
options.objectMode = true;
stream.Writable.call(this, options);
// Output info when done about how many megabytes were parsed
this.once('finish', function() {
console.info('SlowStream finish');
console.info('Megabytes Parsed: ' + this.bytesParsed / (1024 * 1024));
});
}
util.inherits(SlowStream, stream.Writable);
SlowStream.prototype._write = function _write(row, enc, done) {
// Track bytes
this.bytesParsed += row.length;
// Output bytes seen
console.info('Mb Parsed: ' + this.bytesParsed / (1024 * 1024));
// Every 1 MB pause for 5 seconds to simulate writing to a slow
// target and create the need for back pressure
if(Math.floor(this.bytesParsed / _chunkSize) > this.chunkNumber) {
this.chunkNumber = Math.floor(this.bytesParsed / _chunkSize);
console.info('paused');
setTimeout(done, 5000) ;
}
else {
setImmediate(done);
}
};
// Get stream to read from S3
var readStream = s3.getObject({Bucket: testBucket, Key: testPath}).createReadStream();
// Create slow writable stream
var slowStream = new SlowStream();
// Indicate done
slowStream.once('finish', function() {
console.info('Done');
process.exit(0);
});
// Do the piping
readStream.pipe(slowStream);
Ultimately I need to be able to get to a place where this reads the whole file. Swapping out the S3 s3.getObject().createReadStream() for a filestream and everything works fine. Just doesn’t help given why we are using S3.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Reactions: 4
- Comments: 37 (6 by maintainers)
@chrisradek Why close this, when people are still having this issue? There’s no resolution for this, and making multiple
getObject
requests in small parts is not workable, as it results in multiple streams which becomes unmanageable when piping to a process or another file.Any update on this? @chrisradek ?
We’re seeing this on a regular basis, when downloading files from S3 to lambda. The files are ~100MB.
@tilfin Hilarious! I built something similar:
https://www.npmjs.com/package/s3-stream-download
I was waiting to attach it to this thread until we had been using it for a week or so. It has fixed the issue for us. I will check out yours also…
It seems that it was my fault. I replaced the code
with
and my script processed 5120/46536 objects without any errors
RESOLVED
@chriskinsman I made a module recently might be helpful for you. https://github.com/tilfin/s3-block-read-stream