faraday: Instrumentation middleware cannot work with async requests

Reference: https://github.com/lostisland/faraday-retry/issues/2 Adapter: Typhoeus

Problem Statement

The current implementation is:

def call(env)
  @instrumenter.instrument(@name, env) do
    @app.call(env)
  end
end

Let’s define the connection and subscribe to the default event:

faraday = Faraday.new('https://httpstat.us') do |builder|
  builder.request :instrumentation
  builder.adapter :typhoeus
end

ActiveSupport::Notifications.subscribe('request.faraday') do |_, start, finish, _, env|
  puts format(
    '[%<host>s] %<method>s %<request_uri>s [%<start>i -> %<finish>i = %<duration>.3f]',
    host: env[:url].host,
    method: env[:method].to_s.upcase,
    request_uri: env[:url].request_uri,
    start: start.to_i,
    finish: finish.to_i,
    duration: finish - start
  )
end

Consider the following sync request that works perfectly fine:

faraday.get('200?sleep=100')

# [httpstat.us] GET /200?sleep=100 (200) [1642094481 -> 1642094482 = 0.393]

Now, let’s make async (in_parallel) requests:

faraday.in_parallel(Typhoeus::Hydra.new(max_concurrency: 1)) do
  9.times { |i| faraday.get("/200?sleep=#{(i + 1) * 100}") }
end

# [httpstat.us] GET /200?sleep=100 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=200 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=300 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=400 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=500 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=600 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=700 () [1642094450 -> 1642094450 = 0.001]
# [httpstat.us] GET /200?sleep=800 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=900 () [1642094450 -> 1642094450 = 0.000]

Notice that there is no response because these requests are added to the Hydra queue only and will be performed later.

Flawed Solution

Since the notification is published before the response is completed, we should try to delay the notification until there is a response, so we can look at an updated implementation:

def call(request_env)
  @instrumenter.start(@name, request_env)

  @app.call(request_env).on_complete do |response_env|
    @instrumenter.finish(@name, response_env)
  end
end

Let’s make the requests:

faraday.in_parallel(Typhoeus::Hydra.new(max_concurrency: 1)) do
  9.times { |i| faraday.get("/200?sleep=#{(i + 1) * 100}") }
end

# [httpstat.us] GET /200?sleep=100 (200) [1642095297 -> 1642095299 = 1.593]
# [httpstat.us] GET /200?sleep=200 (200) [1642095297 -> 1642095299 = 2.053]
# [httpstat.us] GET /200?sleep=300 (200) [1642095297 -> 1642095300 = 2.726]
# [httpstat.us] GET /200?sleep=400 (200) [1642095297 -> 1642095301 = 3.406]
# [httpstat.us] GET /200?sleep=500 (200) [1642095297 -> 1642095302 = 4.356]
# [httpstat.us] GET /200?sleep=600 (200) [1642095297 -> 1642095303 = 5.369]
# [httpstat.us] GET /200?sleep=700 (200) [1642095297 -> 1642095304 = 6.334]
# [httpstat.us] GET /200?sleep=800 (200) [1642095297 -> 1642095305 = 7.431]
# [httpstat.us] GET /200?sleep=900 (200) [1642095297 -> 1642095306 = 8.577]

We have fixed the response issue, seems like there is a problem with the duration, well not exactly, because technically, we are now measuring the total time to finish the nth request since we submitted the requests.

There is a bigger problem, consider the following requests:

faraday.in_parallel(Typhoeus::Hydra.new(max_concurrency: 1)) do
  9.times do |i|
    faraday.get("/200?sleep=#{(i + 1) * 100}")
    sleep(1)
  end
end

# [httpstat.us] GET /200?sleep=100 (200) [1642095933 -> 1642095935 = 2.261]
# [httpstat.us] GET /200?sleep=200 (200) [1642095932 -> 1642095935 = 3.712]
# [httpstat.us] GET /200?sleep=300 (200) [1642095931 -> 1642095936 = 5.259]
# [httpstat.us] GET /200?sleep=400 (200) [1642095930 -> 1642095937 = 7.818]
# [httpstat.us] GET /200?sleep=500 (200) [1642095929 -> 1642095938 = 9.668]
# [httpstat.us] GET /200?sleep=600 (200) [1642095928 -> 1642095939 = 11.511]
# [httpstat.us] GET /200?sleep=700 (200) [1642095927 -> 1642095940 = 13.456]
# [httpstat.us] GET /200?sleep=800 (200) [1642095926 -> 1642095941 = 15.497]
# [httpstat.us] GET /200?sleep=900 (200) [1642095925 -> 1642095942 = 17.645]

Note that the start times are in reverse order because ActiveSupport::Notifications::Fanout maintains a time stack.

Conclusion

We cannot easily figure out the exact start time of the requests and building a system across all the adapters to get the start times would be a tedious and error-prone process. A better solution is to provide a way to access additional response information from the adapters that may contain timing and other information.

About this issue

Original URL
State: open
Created 2 years ago
Comments: 15 (7 by maintainers)

Most upvoted comments

Thanks @BuonOmo, I believe that’s the same informations @avmnu-sng mentioned in their comment earlier, and it sure looks like a lot of information! The only missing thing there is the integration with a notification system like ActiveSupport::Notifications, but that’s something the middleware could provide.

I spent some more time thinking about this, and I believe the following step-by-step approach should work. It’s mostly based on things that @avmnu-sng already raised, but my brain needed a little extra help to put them together, so I hope this list will help others see them clearly as well:

We change the instrumentation middleware to use the start/stop interface. This could “replace” the existing block behaviour or, to be backwards-compatible, enrich it after a) checking if the instrumenter supports it; or b) check if we’re running in_parallel. This will make things better, but will also introduce some inconsistencies as explained in the ticket description, which we can document and point out
We introduce a new field to the response object that allows adapters to provide additional information about the request. I don’t really like informations 😄, but it will basically contain that info. Maybe instrumentation or something like that.
We update the instrumentation middleware once again so that, in the on_complete callback, it checks for this new field presence, and if found it uses that to enrich the notification payload, then it fires it.
Lastly, we go through each adapter (capable of parallel requests) and we add support for this new field. As I said this will potentially take some time, but it seems like we already know how to do it for Typhoeus.

@BuonOmo @avmnu-sng what do you think about the plan above? Any feedback would be appreciated. Also, @BuonOmo, I appreciate the parallelism support have not been our main concern, mostly due to the Faraday core team inexperience on the topic, but I believe it could become a big focus now that v2.0 is out and we plan our road to v3.0. It would be really useful to have a comprehensive list of current issues so that we can explore and tackle them as necessary. May I suggest we start a new Discussion on the topic so we can coordinate there? How does that sound?

iMacTia on Jan 19, 2022