excon: Resolv.rb issues - threads hanging

We’re noticing issues with our highly-concurrent sidekiq jobs after the bump to Excon v0.80.1 this weekend. Threads are hanging in the resolv.rb code. We’re still investigating, but wanted to post notice here incase others experience similar issue. Below is an example of the stack trace associated with one of our “hung” threads… Any thoughts or suggestions on how/why this might be happening would be appreciated.

Current workaround, downgrade back to v0.79.0.

ruby 2.6.7p197 (2021-04-05 revision 67941) [x86_64-linux]
Sidekiq 6.1.3

Thanks!

Apr 26 07:20:33  | pid=213 tid=go5pkq979 WARN: Thread TID-go5tnitml processor
Apr 26 07:20:33  | pid=213 tid=go5pkq979 WARN: /usr/local/lib/ruby/2.6.0/resolv.rb:617:in `synchronize'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:617:in `allocate_request_id'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:837:in `sender'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:522:in `block in fetch_resource'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:1120:in `block (3 levels) in resolv'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:1118:in `each'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:1118:in `block (2 levels) in resolv'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:1117:in `each'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:1117:in `block in resolv'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:1115:in `each'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:1115:in `resolv'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:517:in `fetch_resource'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:507:in `each_resource'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:404:in `each_address'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:116:in `block in each_address'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:115:in `each'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:115:in `each_address'
Apr 26 07:20:33  | /usr/local/lib/ruby/2.6.0/resolv.rb:58:in `each_address'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/socket.rb:110:in `connect'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/ssl_socket.rb:166:in `connect'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/socket.rb:49:in `initialize'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/ssl_socket.rb:10:in `initialize'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/connection.rb:471:in `new'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/connection.rb:471:in `socket'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/connection.rb:118:in `request_call'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/middlewares/decompress.rb:12:in `request_call'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/middlewares/mock.rb:57:in `request_call'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/middlewares/instrumentor.rb:34:in `request_call'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/middlewares/idempotent.rb:19:in `request_call'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/middlewares/base.rb:22:in `request_call'
Apr 26 07:20:33  | /usr/local/bundle/gems/ddtrace-0.42.0/lib/ddtrace/contrib/excon/middleware.rb:41:in `request_call'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/middlewares/base.rb:22:in `request_call'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/connection.rb:283:in `request'
Apr 26 07:20:33  | /usr/local/bundle/gems/excon-0.80.1/lib/excon/connection.rb:369:in `get'
...

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 15 (8 by maintainers)

Most upvoted comments

@stevenharman awesome, thanks for the update!

I’m going to go ahead and close this now as, to the best of my knowledge, these upstream fixes should remove the problem.

We’ve seen similar hung threads, resulting in timeouts, in the same code:

File /app/vendor/ruby-2.7.3/lib/ruby/2.7.0/resolv.rb line 622 in synchronize

We are running Ruby 2.7.3 with whichever version of resolv ships with that Ruby.