puma: Puma 5.0.0 HTTP connection error: #

After upgrading to puma 5.0.0 I am seeing HTTP connection error: #<Puma::ConnectionError: Connection error detected during read> error in server logs directly after server starts. This happens just on Heroku. Regardless that app is working without any issues…

2020-09-17T17:58:33.045281+00:00 app[web.1]: Puma starting in single mode...
2020-09-17T17:58:33.045331+00:00 app[web.1]: * Version 5.0.0 (ruby 2.7.1-p83), codename: Spoony Bard
2020-09-17T17:58:33.045332+00:00 app[web.1]: * Min threads: 10, max threads: 10
2020-09-17T17:58:33.045332+00:00 app[web.1]: * Environment: production
2020-09-17T17:58:33.684588+00:00 app[web.1]: * Listening on http://0.0.0.0:31428
2020-09-17T17:58:33.684657+00:00 app[web.1]: Use Ctrl-C to stop
2020-09-17T17:58:34.163946+00:00 app[web.1]: 2020-09-17 17:58:34 +0000 HTTP connection error: #<Puma::ConnectionError: Connection error detected during read>
2020-09-17T17:58:35.162918+00:00 heroku[web.1]: State changed from starting to up

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 9
  • Comments: 38 (25 by maintainers)

Commits related to this issue

Most upvoted comments

Okay - I’ll be merging #2384 and releasing a 5.0.1 tomorrow morning.

Good to know this was never really an issue and we just added extra logging in the wrong place. Thanks for the deep dig @cjlarose.

Ok this looks like it might be the relevant code in wrk

https://github.com/wg/wrk/blob/0896020a2a28b84b1150e9b60933b746fe1dc761/src/wrk.c#L96-L100

That, I think calls the lua function wrk.resolve, which calls wrk.connect or script_wrk_connect

https://github.com/wg/wrk/blob/7594a95186ebdfa7cb35477a8a811f84e2a31b62/src/wrk.lua#L12-L20

https://github.com/wg/wrk/blob/7594a95186ebdfa7cb35477a8a811f84e2a31b62/src/script.c#L466-L475

I think wrk just creates an initial TCP connection to make sure it can connect the the server before it begins its main testing phase. If that first connection fails, it’ll print something out to the console and exit.

I imagine Heroku is doing something similar to make sure your app is up and ready to accept connections, but I haven’t come across documentation that makes that explicit. Edit: Heroku tries to ensure that you application binds to the expected $PORT within 60 seconds https://devcenter.heroku.com/articles/error-codes#r10-boot-timeout The error we’re seeing might be related to that check.

In short, #2384 seems like the appropriate solution. Puma already correctly detects when a connection is established and immediately closed. Puma also already closes down the connection gracefully from a TCP perspective. This behavior is not really exceptional and users don’t need to be notified.

If I switch to puma 4.3.6 and add the same logging I see an empty request being read and an EOFError:

nil
#<EOFError: end of file reached>
nil
#<EOFError: end of file reached>

So it’s the same error (and two of them) but it’s just not being logged. AFAIK it looks like this behavior is reasonable. Or at least it’s not causing problems.

Several of yall have said you’ve said stuff like

been noticing increased timeouts. Matches exactly this issue so seems more than just logging.

If that’s the case, then either this wrk example doesn’t reproduce the same failure mode, or the timeouts are unrelated.

TLDR; From my investigation it looks like the behavior of puma hasn’t changed here between 4.x and 5.x, but logging did.

I used the above minimal repo to manually bisect which puma commit between v4.3.6 and v5.0.0 first results in the error being output in a local environment with wrk, and it’s 265f638d2beb9ee60d1af5e5c03ae876b0c3e4d8 (though the error format is different):

=> Booting Puma
=> Rails 6.0.3.3 application starting in development
=> Run `rails server --help` for more startup options
Puma starting in single mode...
* Version 4.3.3 (ruby 2.6.6-p146), codename: Mysterious Traveller
* Min threads: 5, max threads: 5
* Environment: development
* Listening on tcp://0.0.0.0:3000
Use Ctrl-C to stop
2020-09-21 12:40:06 -0400 HTTP connection error: #<Puma::ConnectionError: Connection error detected during read>
Handling request {   } ()
/Users/ryan/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/bundler/gems/puma-265f638d2beb/lib/puma/client.rb:159:in `rescue in try_to_finish'
/Users/ryan/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/bundler/gems/puma-265f638d2beb/lib/puma/client.rb:154:in `try_to_finish'
/Users/ryan/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/bundler/gems/puma-265f638d2beb/lib/puma/client.rb:239:in `eagerly_finish'
/Users/ryan/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/bundler/gems/puma-265f638d2beb/lib/puma/server.rb:199:in `block in run'
/Users/ryan/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/bundler/gems/puma-265f638d2beb/lib/puma/thread_pool.rb:137:in `block in spawn_thread'

I’ve updated the Gemfile in the puma_connection_error repo to use this commit.

Since the error is probably still happening before this commit, and just isn’t logged, it would probably be trickier to bisect that.

So my best guess (at the moment) is that wrk initializes a socket and connects with puma before sending over any data, and this raises an EOFError since there’s no data yet.

I started a puma server bundle exec ruby -Ilib bin/puma test/rackup/hello.ru and started up wrk wrk -t1 -c1 -d1 http://localhost:9292 while performing a packet capture with Wireshark and the evidence from Wireshark appears to corroborate your guess.

When wrk starts up, it establishes a connection with the puma server normally, but then sends a FIN to the puma server (it starts to tear down the TCP connection) before sending any data. I don’t really know why wrk does this quite yet and I’m still looking into it, but figured I should share what I found out so far.

Screen Shot 2020-09-25 at 14 08 14

If you have a different issue than the connection problem in the OP, please open a new issue. Let’s not turn this into a “all 5.0 issues” megathread.

I noted this behavior change introduced by #2250 as a potential regression in https://github.com/puma/puma/pull/2279#issuecomment-642322710. There are three code blocks that catch ConnectionError, and until #2250 they all swallowed the error without logging them. Note the existing comment in one of these code blocks which explains why: https://github.com/puma/puma/blob/3cb00a6b6838bca98af380200c066fb3246fda9a/lib/puma/reactor.rb#L229-L234

5.0.2 is out with the fix update ur deps!

Looks like #959 contains the original reason this was ignored.

A co-worker recommends checking out a tool called mitmproxy for seeing what exactly wrk is sending. I think this is it https://mitmproxy.org/ I’ve not had a chance to use it though.

@schneems It is not only wrk that is causing this. From the 100 lines heroku logs give me 46 about Connection error detected during read for my app on Heroku (https://help.heroku.com/tickets/920245). I haven’t tried to understand this more yet. I can add that the Heroku app has a proxy (nginx) in front of it (users are accessing it via the proxy, the proxy is run on another server). Maybe that makes it more common somehow. I can see the error happening even when there aren’t any requests happening.

So IMO it would be better just to swallow EOFError here. WDYT?

I’m okay with that. But I would feel better if we knew why wrk was causing this. I.e. is it an empty socket connection? Is it trying to make some http 5.0 request that we don’t support ? If we figure that out then we can add a test for it. The only way I was able to raise the same error was the empty netcat request. I’m unclear if that’s the same think wrk is doing or why it’s doing it twice per test-run.

If people are seeing this in prod it makes me wonder what the conditions that cause it are. Are browsers or people sending empty requests or is something else going on? There might be a case here that we should be handling different. If someone wants to go through wrk to try to figure out why the request is getting an EOF, it’s open source (though written in C).

I confirm that I’m seeing a failure with the repro app

  • Works (no error)
# works
gem 'puma', git: 'https://github.com/puma/puma.git', ref: 'c606471b5ae436a165ee2f083a35635c90a29b93'
  • Fails (with error)
# fails
 gem 'puma', git: 'https://github.com/puma/puma.git', ref: '265f638d2beb9ee60d1af5e5c03ae876b0c3e4d8' 

For convenience here are commits in a historical view that contains both of those: https://github.com/puma/puma/commits/master?after=7d00e1dc3c71b77ea171f503e6c9be24836314eb+174&branch=master

cc @alexeevit any idea what might be at play here?

@schneems I created a ticket for my app, https://help.heroku.com/tickets/920245, if it can help in any way

I can also reproduce(-ish) it locally in macOS. It happens more often on Heroku than I see locally, but I haven’t tried replicating the same traffic locally (but the production app does not have a heavy load). The code for my app is public: https://github.com/Starkast/wikimum

The steps I did in macOS to start the app (using https://github.com/Starkast/wikimum/commit/0bc59331ebcedfca482e1b54211788edf9ccb0dd)

createdb wikimum-puma-2371
DATABASE_URL=postgres://localhost/wikimum-puma-2371 bundle e rake db:migrate
DATABASE_URL=postgres://localhost/wikimum-puma-2371 bundle e puma -C config/puma.rb

After Puma has booted, and when the wrk command is started, I can see one line of 2020-09-21 17:30:16 +0200 HTTP connection error: #<Puma::ConnectionError: Connection error detected during read> before the request logging starts. Full log here. I’ve tested the same thing using Puma 4.3.6 and the error isn’t logged then.

There are no timeouts from wrk:

$ wrk http://127.0.0.1:3000
Running 10s test @ http://127.0.0.1:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    49.27ms   19.55ms 231.59ms   87.16%
    Req/Sec   105.14     20.24   160.00     74.49%
  2081 requests in 10.05s, 5.65MB read
Requests/sec:    206.99
Transfer/sec:    575.28KB

@schneems

  1. All my reproductions are happening locally, our app doesn’t run on Heroku (it runs on k8s). I can deploy the app to a staging environment and see if I have timeouts their if it helps.

  2. I tried creating a minimal reproduction app but can’t, this why I published all info I can give - from logs to trial to reproduction on simple rails app and puma hello.ru example.

If you can give me any guidance on how to give better visibility what’s going on I’ll be happy to help.

This might be that more logging was added in 5.0.0 and that Puma has always operated in this way.