rails: 7.0.1: Activesupport instrumentor: nil can't be coerced into Float (TypeError)
I have a Rails 7.0.1 test application that when it tries to load in multiple images (more than 2) it can regularly fail to load one or more images: on my dev system it happens most of the time.
Each time it fails you get the same 500 error: nil can't be coerced into Float (TypeError)
from activesupport instrumenter. This happens on Rails 7.0 only and not on the almost identical 6.1 version.
Rails 7.0.1 - fails to load one of the images most of the time.
Rails 6.1 - never seen if fail
Stack trace
Code that throws the exception is here
Puma caught this error: nil can't be coerced into Float (TypeError)
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/activesupport-7.0.1/lib/active_support/notifications/instrumenter.rb:133:in `-'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/activesupport-7.0.1/lib/active_support/notifications/instrumenter.rb:133:in `duration'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/activesupport-7.0.1/lib/active_support/core_ext/enumerable.rb:61:in `map'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/activesupport-7.0.1/lib/active_support/core_ext/enumerable.rb:61:in `sum'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/activesupport-7.0.1/lib/active_support/core_ext/enumerable.rb:298:in `sum'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_dispatch/middleware/server_timing.rb:26:in `block in call'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_dispatch/middleware/server_timing.rb:25:in `each'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_dispatch/middleware/server_timing.rb:25:in `map'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_dispatch/middleware/server_timing.rb:25:in `call'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_dispatch/middleware/executor.rb:14:in `call'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_dispatch/middleware/static.rb:23:in `call'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/rack-2.2.3/lib/rack/sendfile.rb:110:in `call'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/actionpack-7.0.1/lib/action_dispatch/middleware/host_authorization.rb:137:in `call'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/railties-7.0.1/lib/rails/engine.rb:530:in `call'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/puma-5.5.2/lib/puma/configuration.rb:249:in `call'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/puma-5.5.2/lib/puma/request.rb:77:in `block in handle_request'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/puma-5.5.2/lib/puma/thread_pool.rb:340:in `with_force_shutdown'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/puma-5.5.2/lib/puma/request.rb:76:in `handle_request'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/puma-5.5.2/lib/puma/server.rb:447:in `process_client'
/Users/me/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/puma-5.5.2/lib/puma/thread_pool.rb:147:in `block in spawn_thread'
Steps to reproduce
I have a couple of apps one which works based on 6.1 and another that doesn’t work based on 7.0.1
- Clone repo
- cd into rails directory
- bin/setup (standard setup except it runs db:seed)
- Repeat the following steps until you notice an image not loading
- bin/rails server
- browser opens localhost:3000 and keep “Network” tab open
- check if the 10 images all loaded
- close rails server and back to step i (It’s important to stop server as normally once images loads they keep loading)
- Check network tab for the image that failed to load and look at the response
Browser Network tab showing exceptions
There’s very little code changes in the example applications. Basically:
- new application
- scaffold product
- install active_storage
- product has_one_attached image
- seed products with images
- Start running the server following the steps to reproduce …
I’ve also tried:
- With Rails 7.0 displaying static images from public - loads it fine.
- All the browsers show the same issue
- Rails 6.1 version works with or without spring but with spring every now and then it could hang until you did a
spring stop
. - Larger images 300 x 300 also failed in similar manner and not noticeably more often.
Expected behavior
It should load the images every time.
Actual behavior
Most of the time in the test application it fails
System configuration
Rails 7.0.1
Ruby 3.0.3
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (14 by maintainers)
Links to this issue
Commits related to this issue
- Use finish_with_state inside Rack::Logger This ensures that the finish instrumenter is sent to the same subscribers that the initial request was sent to. This solves an issue where in a threaded env... — committed to jhawthorn/rails by jhawthorn 2 years ago
- rails: disable Server-Timing header in production. This produces random `nil can't be coerced into Float (TypeError)` errors in production because of a Rails bug. This may also be the cause of random... — committed to danbooru/danbooru by evazion 2 years ago
Thank you all for the reproductions, they really helped 😁.
I think this is the combination of several issues: some limitations with the existing code and a new change which caused the combination to fail.
The change leading to this issue appearing in 7.0 is the
ActionDispatch::ServerTiming
middleware. Specifically the dynamic subscription being done before and after each request. Doing a subscription like this has never really been safe throughout the history of AS::Notifications, and the docs advise against it (though the language should be stronger, it’s not just a performance issue). I also don’t think this subscription works as intended if it wasn’t for limitations of notifications. The topics are subscribed to globally so if we are serving requests in parallel this receives instrumentation from all active Threads. Additionally nested notifications even from the same request are going to be reported incorrectly. We probably need to revisit this implementation.Why aren’t dynamic subscriptions more of an issue? Usually we’re able to avoid the issues that can cause by using finish_with_state which ensures we’re iterating over the same set of listeners. This is the path taken by
Notifications.instrument
and similar friendly user-facing APIs. However there’s one place (that I found) where we’re not: Rack::Logger.Because
Rack::Logger
usesfinish
directly it gets a new list of subscribers. If there had been subscriptions added in between the start and finish, we’ll have too many subscribers (as mentioned above) and will be popping more than we should off the stack, leading to thenil
s. Fortunately, we can solve this relatively simply by usingfinish_with_state
. I’ve opened #44454, I’d love help verifying that this fixes the issues 🙇♂️.I think this issue is good indication that we should be deprecating the “stateless”
start
/finish
and ensure everything is usingstart
/finish_with_state
(or ideally just the block form ofinstrument
).Finally, I do think we should improve notifications to better handle this. Popping values off of a stack for each individual subscriber is slow and error prone. I spiked on avoiding all stack operations for the stateful case, which I’ll open a PR for soon, but even in the stateless case if we had one stack item per-event rather than per event-subscriber this issue bug have been worked around.
TLDR: For anyone running into this the best immediate fix is to disable server_timing in
development.rb
. #44454 should solve the issue, and I hope to backport it to 7-0-stable. We should find another way to implementServerTiming
. Notifications could also use improvement.This was fixed in #44454 and backported to 7-0-stable, but just hasn’t been released yet. In the meantime you can disable the ServerTiming middleware (or point your
Gemfile
at7-0-stable
) and the issue should go away.