puma: Sinatra streaming broken in Puma 6.0.0

See https://github.com/sinatra/sinatra/issues/1832 for more details but this simple app https://github.com/sinatra/sinatra/blob/v3.0.2/examples/stream.ru works fine with Puma 5.6.5 but with 6.0.0 it waits until all sleeps have finished before the response is returned (this is more noticeable if you increase the sleep time in the example).

As noted in https://github.com/sinatra/sinatra/issues/1832#issuecomment-1291175026 the example runs as expected in falcon (v0.42.3).

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 29 (26 by maintainers)

Commits related to this issue

CI: Test with Puma 6 but allow it to fail Actually, test the master branch of Puma like we do with Rack. Pin regular jobs to Puma 5.x due to the streaming bug: https://github.com/puma/puma/issues/30... — committed to dentarg/sinatra by dentarg 2 years ago
fix: puma 6 break streaming see https://github.com/puma/puma/issues/3000 — committed to betagouv/datapass by rdubigny 2 years ago
chore: use latest puma version that fixes https://github.com/puma/puma/issues/3000 — committed to betagouv/datapass by rdubigny 2 years ago

Most upvoted comments

We will be working on updates to Rails pretty soon.

ioquatix on Nov 15, 2022

Hello, I just want to give a bit of feedback here. As I tried upgrading to puma 6 I noticed it broke streaming in my Rails app too. I’m using my render-later library which uses classic rails controller streaming (+a couple hacks to actually get it to work as it’s mostly broken by default). It’s easy to reproduce the problem as I wrote a spec already, so running render-later specs (rake) with puma 6 gives:

[render-later] (master) > rake
.......F
Failure:
RenderLater::EngineTest#test_streaming_works [/home/bigbourin/work/render-later/test/render-later/engine_test.rb:33]:
Not enough time between first and last chunk.
Expected 0.00048107299999999853 to be > 1.

Though it works fine with puma 5. But,

I already had a look at #3004 and tried with this branch (MSP-Greg:00-size-to-first-byte) and I can confirm that using this version (gem 'puma', github: 'MSP-Greg/puma', branch: '00-size-to-first-byte'), it seems to fix the problem for my use-case ✔️ :

[render-later] (master) > rake
..........
Finished in 2.366557s, 4.2255 runs/s, 8.0285 assertions/s.
10 runs, 19 assertions, 0 failures, 0 errors, 0 skips

So honestly I did not spend much time investigating the changes and how the MR fixes this, I can wait for 6.0.1 ☺ But as it seems my use-case was not puma 6.0.0 ready, if you need my opinion in some decisions do not hesitate (I can spend more time investigating if needed too).

Thank you all for your work 🙇

jarthod on Nov 14, 2022

@jarthod Thank you for testing PR #3004.

It’s an update to #2896, which attempted to determine properties of the response body and adjust how the response body was assembled/transmitted. That came out of running benchmarks using various body type/size combinations and seeing where Puma’s performance might be able to be improved.

#2896 was a bit too aggressive and didn’t account for streamed bodies that were enumerations. #3004 has added more tests and CI jobs, so hopefully it’s ready for production…

MSP-Greg on Nov 15, 2022

I don’t like the idea of closing a bug with a config setting. You shouldn’t need to configure Puma to make chunked responses work correctly. Am I reading this wrong?

nateberkopec on Oct 28, 2022

Ok thanks for the summary! So I shall wait for this to me merged then.

And if I understand correctly, with Rack 3 the prefferred way to stream is now “streaming bodies” (callable) instead of enumerable, but as this was just implemented and requires Rack 3 I suppose this won’t change before a while in Rails ?

jarthod on Nov 15, 2022

As in a streaming body responds to call?

MSP-Greg on Oct 30, 2022

I think the current answer is:

Enumerable bodies can be buffered but don’t need to be.
Streaming bodies shouldn’t be buffered.

ioquatix on Oct 30, 2022

Just to make sure all are aware, to_a and to_ary are similar, but different. Enumerable has a to_a method, Array has a to_ary method (returns self), and also a method to_a because Enumerable is in its instance chain.

for a file, it might have internally blocking reads

That, along with the ‘text’ based iteration was the reason for my PR in Rack…

MSP-Greg on Oct 29, 2022

As discussed, buffering is critical to the latency/throughput trade offs of an application.

I believe that when streaming, you should not be buffering - it becomes an application concern to decide the throughput and latency trade off.

I model this in Falcon using the concept of ready?. If a body is ready, reading from it won’t block.

A user stream can have a small internal buffer. If there is a chunk in the buffer, then ready? would be true (because a chunk is immediately available). If the user stream is empty (the application has not generated a chunk), ready? is false (reading would block until a chunk is available).

For an Array and other enumerable bodies (respond to to_ary) we should always consider it to be ready?. This is one of the reasons for explicitly modelling enumerable body vs streaming body. If you recall, Rack:ETag was not able to distinguish between these two cases and ended up breaking streaming on a fairly regular basis.

At the output layer, you can take advantage of ready? to avoid latency issues. For example, if the response body is always ready, we don’t need to flush and can let TCP continue the buffering: https://github.com/socketry/protocol-http1/blob/e6a9235102986a7a5462aea251f2fc9cdc00d65b/lib/protocol/http1/connection.rb#L288. You can search for other instances of ready? in that code to see how the buffering strategy works.

The actual implementation of ready? is non-trivial for Rack 2 but fairly straight forward in Rack 3.

For enumerable body: https://github.com/socketry/protocol-rack/blob/73705f364ba009eb1bbabbad88de1e627cb37c09/lib/protocol/rack/body/enumerable.rb#L50-L53 (now that I look on it, I realise the first check for Array is redundant :p).

For streaming body the default is used (from inheritance): https://github.com/socketry/protocol-http/blob/5f9c3d4fcda5bf89e8e033d8e1ff210f0dc8342d/lib/protocol/http/body/readable.rb#L47-L52 which is false.

The implication of this, as outlined above, is that streaming responses will map output chunks to TCP packets, and if the user does a poor job of buffering the output, that’s on them (according to the implementation/design). You could obviously introduce your own buffering for streaming responses but the trade off would be latency, and I think, all things considered, that is the wrong trade off.

If you want to improve the buffering in this case, I’d advise you to check TCP_NODELAY and TCP_CORK.

ioquatix on Oct 29, 2022