puma: Puma 5.0.0 HTTP connection error: #
After upgrading to puma 5.0.0
I am seeing HTTP connection error: #<Puma::ConnectionError: Connection error detected during read>
error in server logs directly after server starts. This happens just on Heroku
. Regardless that app is working without any issues…
2020-09-17T17:58:33.045281+00:00 app[web.1]: Puma starting in single mode...
2020-09-17T17:58:33.045331+00:00 app[web.1]: * Version 5.0.0 (ruby 2.7.1-p83), codename: Spoony Bard
2020-09-17T17:58:33.045332+00:00 app[web.1]: * Min threads: 10, max threads: 10
2020-09-17T17:58:33.045332+00:00 app[web.1]: * Environment: production
2020-09-17T17:58:33.684588+00:00 app[web.1]: * Listening on http://0.0.0.0:31428
2020-09-17T17:58:33.684657+00:00 app[web.1]: Use Ctrl-C to stop
2020-09-17T17:58:34.163946+00:00 app[web.1]: 2020-09-17 17:58:34 +0000 HTTP connection error: #<Puma::ConnectionError: Connection error detected during read>
2020-09-17T17:58:35.162918+00:00 heroku[web.1]: State changed from starting to up
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 9
- Comments: 38 (25 by maintainers)
Commits related to this issue
- [close #2371] Don't report EOFError — committed to puma/puma by schneems 4 years ago
- Failing test for #2371 The EOF can be triggered by opening a connection, not writing to it, then closing the connection. When this happens we can detect if an error was logged or not by inspecting th... — committed to puma/puma by schneems 4 years ago
- [close #2371] Do not log EOFError This is a continuation from #2382. One of the ways that an EOFError can be triggered is by opening a connection, not writing to it, then closing it (there may be ot... — committed to puma/puma by schneems 4 years ago
- [close #2371] Do not log EOFError This is a continuation from #2382. One of the ways that an EOFError can be triggered is by opening a connection, not writing to it, then closing it (there may be ot... — committed to puma/puma by schneems 4 years ago
Okay - I’ll be merging #2384 and releasing a 5.0.1 tomorrow morning.
Good to know this was never really an issue and we just added extra logging in the wrong place. Thanks for the deep dig @cjlarose.
Ok this looks like it might be the relevant code in
wrk
https://github.com/wg/wrk/blob/0896020a2a28b84b1150e9b60933b746fe1dc761/src/wrk.c#L96-L100
That, I think calls the lua function
wrk.resolve
, which callswrk.connect
orscript_wrk_connect
https://github.com/wg/wrk/blob/7594a95186ebdfa7cb35477a8a811f84e2a31b62/src/wrk.lua#L12-L20
https://github.com/wg/wrk/blob/7594a95186ebdfa7cb35477a8a811f84e2a31b62/src/script.c#L466-L475
I think wrk just creates an initial TCP connection to make sure it can connect the the server before it begins its main testing phase. If that first connection fails, it’ll print something out to the console and exit.
I imagine Heroku is doing something similar to make sure your app is up and ready to accept connections, but I haven’t come across documentation that makes that explicit. Edit: Heroku tries to ensure that you application binds to the expected
$PORT
within 60 seconds https://devcenter.heroku.com/articles/error-codes#r10-boot-timeout The error we’re seeing might be related to that check.In short, #2384 seems like the appropriate solution. Puma already correctly detects when a connection is established and immediately closed. Puma also already closes down the connection gracefully from a TCP perspective. This behavior is not really exceptional and users don’t need to be notified.
If I switch to puma 4.3.6 and add the same logging I see an empty request being read and an EOFError:
So it’s the same error (and two of them) but it’s just not being logged. AFAIK it looks like this behavior is reasonable. Or at least it’s not causing problems.
Several of yall have said you’ve said stuff like
If that’s the case, then either this
wrk
example doesn’t reproduce the same failure mode, or the timeouts are unrelated.TLDR; From my investigation it looks like the behavior of puma hasn’t changed here between 4.x and 5.x, but logging did.
I used the above minimal repo to manually bisect which puma commit between v4.3.6 and v5.0.0 first results in the error being output in a local environment with
wrk
, and it’s 265f638d2beb9ee60d1af5e5c03ae876b0c3e4d8 (though the error format is different):I’ve updated the
Gemfile
in thepuma_connection_error
repo to use this commit.Since the error is probably still happening before this commit, and just isn’t logged, it would probably be trickier to bisect that.
I started a puma server
bundle exec ruby -Ilib bin/puma test/rackup/hello.ru
and started up wrkwrk -t1 -c1 -d1 http://localhost:9292
while performing a packet capture with Wireshark and the evidence from Wireshark appears to corroborate your guess.When
wrk
starts up, it establishes a connection with the puma server normally, but then sends aFIN
to the puma server (it starts to tear down the TCP connection) before sending any data. I don’t really know whywrk
does this quite yet and I’m still looking into it, but figured I should share what I found out so far.If you have a different issue than the connection problem in the OP, please open a new issue. Let’s not turn this into a “all 5.0 issues” megathread.
I noted this behavior change introduced by #2250 as a potential regression in https://github.com/puma/puma/pull/2279#issuecomment-642322710. There are three code blocks that catch
ConnectionError
, and until #2250 they all swallowed the error without logging them. Note the existing comment in one of these code blocks which explains why: https://github.com/puma/puma/blob/3cb00a6b6838bca98af380200c066fb3246fda9a/lib/puma/reactor.rb#L229-L2345.0.2 is out with the fix update ur deps!
Looks like #959 contains the original reason this was ignored.
A co-worker recommends checking out a tool called
mitmproxy
for seeing what exactlywrk
is sending. I think this is it https://mitmproxy.org/ I’ve not had a chance to use it though.@schneems It is not only
wrk
that is causing this. From the 100 linesheroku logs
give me 46 aboutConnection error detected during read
for my app on Heroku (https://help.heroku.com/tickets/920245). I haven’t tried to understand this more yet. I can add that the Heroku app has a proxy (nginx) in front of it (users are accessing it via the proxy, the proxy is run on another server). Maybe that makes it more common somehow. I can see the error happening even when there aren’t any requests happening.I’m okay with that. But I would feel better if we knew why
wrk
was causing this. I.e. is it an empty socket connection? Is it trying to make some http 5.0 request that we don’t support ? If we figure that out then we can add a test for it. The only way I was able to raise the same error was the empty netcat request. I’m unclear if that’s the same think wrk is doing or why it’s doing it twice per test-run.If people are seeing this in prod it makes me wonder what the conditions that cause it are. Are browsers or people sending empty requests or is something else going on? There might be a case here that we should be handling different. If someone wants to go through wrk to try to figure out why the request is getting an EOF, it’s open source (though written in C).
I confirm that I’m seeing a failure with the repro app
For convenience here are commits in a historical view that contains both of those: https://github.com/puma/puma/commits/master?after=7d00e1dc3c71b77ea171f503e6c9be24836314eb+174&branch=master
cc @alexeevit any idea what might be at play here?
@schneems I created a ticket for my app, https://help.heroku.com/tickets/920245, if it can help in any way
I can also reproduce(-ish) it locally in macOS. It happens more often on Heroku than I see locally, but I haven’t tried replicating the same traffic locally (but the production app does not have a heavy load). The code for my app is public: https://github.com/Starkast/wikimum
The steps I did in macOS to start the app (using https://github.com/Starkast/wikimum/commit/0bc59331ebcedfca482e1b54211788edf9ccb0dd)
After Puma has booted, and when the
wrk
command is started, I can see one line of2020-09-21 17:30:16 +0200 HTTP connection error: #<Puma::ConnectionError: Connection error detected during read>
before the request logging starts. Full log here. I’ve tested the same thing using Puma 4.3.6 and the error isn’t logged then.There are no timeouts from
wrk
:@schneems
All my reproductions are happening locally, our app doesn’t run on Heroku (it runs on k8s). I can deploy the app to a staging environment and see if I have timeouts their if it helps.
I tried creating a minimal reproduction app but can’t, this why I published all info I can give - from logs to trial to reproduction on simple rails app and puma hello.ru example.
If you can give me any guidance on how to give better visibility what’s going on I’ll be happy to help.
This might be that more logging was added in 5.0.0 and that Puma has always operated in this way.