request: Error: You cannot pipe after data has been emitted from the response.

I am encountering an issue where I will get the “You cannot pipe after data has been emitted from the response.” error when trying to pipe to a WriteStream. This only occurs when there has been a delay between when the request() call was made, and when the readStream.pipe(writeStream) call is made.

Node version: v0.10.24 Request version: 2.34.0

I created a github repo with the source that reproduces the problem: https://github.com/joe-spanning/request-error

Code is also below for convenience:

var fs = require('fs'),
  request = require('request');

var readStream = request({
  url: 'https://gist.githubusercontent.com/joe-spanning/6902070/raw/8967b351aec744a2bb51c16cb847c44636cf53d9/pipepromise.js'
});

// wait for 5 seconds, then pipe
setTimeout(function() {

  var writeStream = readStream.pipe(fs.createWriteStream('test.js'));

  writeStream.on('finish', function() {
    console.log('all done!');
  });

  writeStream.on('error', function(err) {
    console.error(err);
  });

}, 5000);

You might need to play with the timeout value to reproduce the error. Longer timeouts increase the probability that the error will occur.

/request-error/node_modules/request/request.js:1299
      throw new Error("You cannot pipe after data has been emitted from the re
            ^
Error: You cannot pipe after data has been emitted from the response.
    at Request.pipe (/request-error/node_modules/request/request.js:1299:13)
    at null._onTimeout (/request-error/pipe-error.js:16:32)
    at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Comments: 30 (8 by maintainers)

Commits related to this issue

Most upvoted comments

@joepie91 Taking the original example at the top and piping immediately to a PassThrough runs without problems. Maybe I’m misunderstanding the issue?

var fs = require('fs'),
  request = require('request');

var readStream = request({
  url: 'https://gist.githubusercontent.com/joe-spanning/6902070/raw/8967b351aec744a2bb51c16cb847c44636cf53d9/pipepromise.js'
})
.pipe(require('stream').PassThrough());  // this line is the only modification

// wait for 5 seconds, then pipe
setTimeout(function() {

  var writeStream = readStream.pipe(fs.createWriteStream('test.js'));

  writeStream.on('finish', function() {
    console.log('all done!');
  });

  writeStream.on('error', function(err) {
    console.error(err);
  });

}, 5000);

@aldeed I have forked the repository and applied your fix: https://github.com/joepie91/request. Depending on the degree to which this (original) repository is maintained in the future, I might merge more fixes into there. I’m not sure yet.

I’d been debugging this issue for a few hours now, but was unable to find a solution until I read this thread. Some more technical background information on why this problem is occurring follows. It’s a bit simplified, to make it easier to follow along.

request offers a ‘streaming API’ in the sense that you can .pipe the original request object to a WritableStream. However, since this request object is not the response stream itself, request ‘shims’ the data event - it creates a listener for the data event on the response stream, and then re-emits this event on itself, thus being able to have the request object act as a stream itself.

There’s only one problem: as soon as you attach to the data event of a stream, it starts flowing.

Now this is normally not an issue; Node.js works with ‘ticks’ - consider them ‘cycles’ in the interpreter. The simplified explanation is that one block of code will consume a ‘tick’, and no events will be checked or fired until the next tick. This means that the following hypothetical code will work fine:

stream = getSomeRequestStream();
stream.pipe(target);

Both lines are executed in the same tick, and the first data event can’t possibly occur until the next tick - that’s how the interpreter works internally. This means that you have a guarantee that the .pipe is processed before the first data event fires, and everything will work as expected.

Now, what @joe-spanning attempted to do was use an asynchronous construct, a timeout in this particular example. The stream initialization and .pipe no longer exist in the same block of code.

Asynchronous constructs follow the same rule as events - they can’t possibly execute until the next tick. However, by the time your asynchronous construct executes, the event cycle has just finished and the first data event is already fired - your first chunk of data comes through in the gap between the stream initialization and your .pipe. request detects that a data event has already fired, and tells you that you can no longer .pipe the stream since it has already started flowing.

This same problem occurs for anything that happens at least one tick later than the stream initialization.

The solution by @aldeed fixes this by checking if there are any listeners attached to the request object, and if not, putting the chunk of data back in the original stream and pausing that original stream - thus stopping it from flowing. Once a listener is attached, it then resumes the original stream, and this time there will be a data event handler listening.

The way the data event shim works in request is actually a design flaw. It does not take into account the possibility that stream initialization and piping may not occur in the same tick, and that Node.js is an inherently asynchronous platform (thus, these kind of situations are to be expected). While this fix does work, and should work reliably, a better solution would be to defer the binding to the data event on the original stream, until the first listener is attached.

One last note; if you define a callback in your request initialization, this fix will not work.

Internally, if a callback is supplied, request will attach a data event handler to itself, read out the body, and supply the body to your callback. The way around this, is to attach to the response event instead (and not specify a callback). This event will (reliably) fire as soon as a response is received and the request is ready for streaming.

@aldeed I’ve been trying to fix this for hours, with all my attempts to actually fix it failing, and I can now finally continue working on my thing. I owe you a beer 😃

I know this issue is old, but in my case (similar to @dogancelik, piping a response conditionally based on the response headers), I was able to do the following:

const req = request.get(URL);
req.on('response', (response) => {
	response.pause();
	// Do stuff with response.statusCode etc.
	setTimeout(() => {
		response
		.pipe(process.stdout)
		;
	}, 2000);
});

i.e., pausing the response stream (request apparently sets the stream to flowing mode automatically?) and piping the response object itself (not the req).

I’d welcome feedback from users more knowledgeable than me about this.

I have the same problem except, I get this error when I try to .pipe in .on('response'):

req = request(options)
req.on 'response', (response) ->
  # get info from headers here
  stream = createWriteStream file
  req.pipe stream

Nah, doesn’t work for me. Due to the lack of activity on this issue, i’m just going to stop using request and go back to the Node.js http module.

Is there an update on this?

The quickest fix might be to pipe immediately to a stream.PassThrough() object You should be able pipe from the passthrough object at any later time without issues. See: http://nodejs.org/api/stream.html#stream_class_stream_passthrough

Update from my side, not sure why I didn’t post this before.

Due to this issue not being resolved timely and running across a number of other frustrating bugs in request, I’ve written my own HTTP client library a while ago that deals correctly with this usecase; bhttp. It also solves the Streams2 support issues.

I don’t know what other implications it might have, but I am able to solve this issue by altering request.js as follows:

Add:

var superOn = Request.prototype.on;
Request.prototype.on = function (eventName) {
  if (eventName === "data") {
    this.resume()
  }
  superOn.apply(this, arguments)
}

And change the dataStream.on("data" call to this:

dataStream.on("data", function (chunk) {
      var emitted = self.emit("data", chunk)
      if (emitted) {
        self._destdata = true
      } else {
        // pause URL stream until we pipe it
        dataStream.pause()
        dataStream.unshift(chunk)
      }
})

The basic issue (I think) is that onResponse is calling resume() and attaching a “data” listener, both of which start the data flowing, but we might not be ready to pipe it yet. So my solution is if we try to emit data but we don’t have any listeners yet, we pause the underlying readstream until we get a data listener attached.

Would be good if someone else could test this solution and verify that it works.

@julien-c Good call, this indeed seems to be because binding a response listener causes it to switch into to “flowing mode” automatically. I can verify that it’s happening in both request@2.81.0 and request@2.83.0.

Not sure if this is intended or not (my understanding was that this isn’t how it was supposed to work). For now, I’ll plan on using the workaround of pausing the stream (I’ll try & remember to post back here if I find out anything new along the way)