puma: Sinatra streaming broken in Puma 6.0.0
See https://github.com/sinatra/sinatra/issues/1832 for more details but this simple app https://github.com/sinatra/sinatra/blob/v3.0.2/examples/stream.ru works fine with Puma 5.6.5 but with 6.0.0 it waits until all sleep
s have finished before the response is returned (this is more noticeable if you increase the sleep time in the example).
As noted in https://github.com/sinatra/sinatra/issues/1832#issuecomment-1291175026 the example runs as expected in falcon (v0.42.3).
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 29 (26 by maintainers)
Commits related to this issue
- CI: Test with Puma 6 but allow it to fail Actually, test the master branch of Puma like we do with Rack. Pin regular jobs to Puma 5.x due to the streaming bug: https://github.com/puma/puma/issues/30... — committed to dentarg/sinatra by dentarg 2 years ago
- fix: puma 6 break streaming see https://github.com/puma/puma/issues/3000 — committed to betagouv/datapass by rdubigny 2 years ago
- chore: use latest puma version that fixes https://github.com/puma/puma/issues/3000 — committed to betagouv/datapass by rdubigny 2 years ago
We will be working on updates to Rails pretty soon.
Hello, I just want to give a bit of feedback here. As I tried upgrading to puma 6 I noticed it broke streaming in my Rails app too. I’m using my render-later library which uses classic rails controller streaming (+a couple hacks to actually get it to work as it’s mostly broken by default). It’s easy to reproduce the problem as I wrote a spec already, so running
render-later
specs (rake
) with puma 6 gives:Though it works fine with puma 5. But,
I already had a look at #3004 and tried with this branch (
MSP-Greg:00-size-to-first-byte
) and I can confirm that using this version (gem 'puma', github: 'MSP-Greg/puma', branch: '00-size-to-first-byte'
), it seems to fix the problem for my use-case ✔️ :So honestly I did not spend much time investigating the changes and how the MR fixes this, I can wait for
6.0.1
☺ But as it seems my use-case was not puma 6.0.0 ready, if you need my opinion in some decisions do not hesitate (I can spend more time investigating if needed too).Thank you all for your work 🙇
@jarthod Thank you for testing PR #3004.
It’s an update to #2896, which attempted to determine properties of the response body and adjust how the response body was assembled/transmitted. That came out of running benchmarks using various body type/size combinations and seeing where Puma’s performance might be able to be improved.
#2896 was a bit too aggressive and didn’t account for streamed bodies that were enumerations. #3004 has added more tests and CI jobs, so hopefully it’s ready for production…
I don’t like the idea of closing a bug with a config setting. You shouldn’t need to configure Puma to make chunked responses work correctly. Am I reading this wrong?
Ok thanks for the summary! So I shall wait for this to me merged then.
And if I understand correctly, with Rack 3 the prefferred way to stream is now “streaming bodies” (
call
able) instead of enumerable, but as this was just implemented and requires Rack 3 I suppose this won’t change before a while in Rails ?As in a streaming body responds to
call
?I think the current answer is:
Just to make sure all are aware,
to_a
andto_ary
are similar, but different.Enumerable
has ato_a
method,Array
has ato_ary
method (returnsself
), and also a methodto_a
becauseEnumerable
is in its instance chain.That, along with the ‘text’ based iteration was the reason for my PR in Rack…
As discussed, buffering is critical to the latency/throughput trade offs of an application.
I believe that when streaming, you should not be buffering - it becomes an application concern to decide the throughput and latency trade off.
I model this in Falcon using the concept of
ready?
. If a body is ready, reading from it won’t block.A user stream can have a small internal buffer. If there is a chunk in the buffer, then
ready?
would be true (because a chunk is immediately available). If the user stream is empty (the application has not generated a chunk),ready?
is false (reading would block until a chunk is available).For an
Array
and other enumerable bodies (respond toto_ary
) we should always consider it to beready?
. This is one of the reasons for explicitly modelling enumerable body vs streaming body. If you recall,Rack:ETag
was not able to distinguish between these two cases and ended up breaking streaming on a fairly regular basis.At the output layer, you can take advantage of
ready?
to avoid latency issues. For example, if the response body is always ready, we don’t need to flush and can let TCP continue the buffering: https://github.com/socketry/protocol-http1/blob/e6a9235102986a7a5462aea251f2fc9cdc00d65b/lib/protocol/http1/connection.rb#L288. You can search for other instances ofready?
in that code to see how the buffering strategy works.The actual implementation of
ready?
is non-trivial for Rack 2 but fairly straight forward in Rack 3.For enumerable body: https://github.com/socketry/protocol-rack/blob/73705f364ba009eb1bbabbad88de1e627cb37c09/lib/protocol/rack/body/enumerable.rb#L50-L53 (now that I look on it, I realise the first check for
Array
is redundant :p).For streaming body the default is used (from inheritance): https://github.com/socketry/protocol-http/blob/5f9c3d4fcda5bf89e8e033d8e1ff210f0dc8342d/lib/protocol/http/body/readable.rb#L47-L52 which is
false
.The implication of this, as outlined above, is that streaming responses will map output chunks to TCP packets, and if the user does a poor job of buffering the output, that’s on them (according to the implementation/design). You could obviously introduce your own buffering for streaming responses but the trade off would be latency, and I think, all things considered, that is the wrong trade off.
If you want to improve the buffering in this case, I’d advise you to check
TCP_NODELAY
andTCP_CORK
.