faraday: Instrumentation middleware cannot work with async requests
Reference: https://github.com/lostisland/faraday-retry/issues/2 Adapter: Typhoeus
Problem Statement
The current implementation is:
def call(env)
@instrumenter.instrument(@name, env) do
@app.call(env)
end
end
Let’s define the connection and subscribe to the default event:
faraday = Faraday.new('https://httpstat.us') do |builder|
builder.request :instrumentation
builder.adapter :typhoeus
end
ActiveSupport::Notifications.subscribe('request.faraday') do |_, start, finish, _, env|
puts format(
'[%<host>s] %<method>s %<request_uri>s [%<start>i -> %<finish>i = %<duration>.3f]',
host: env[:url].host,
method: env[:method].to_s.upcase,
request_uri: env[:url].request_uri,
start: start.to_i,
finish: finish.to_i,
duration: finish - start
)
end
Consider the following sync request that works perfectly fine:
faraday.get('200?sleep=100')
# [httpstat.us] GET /200?sleep=100 (200) [1642094481 -> 1642094482 = 0.393]
Now, let’s make async (in_parallel) requests:
faraday.in_parallel(Typhoeus::Hydra.new(max_concurrency: 1)) do
9.times { |i| faraday.get("/200?sleep=#{(i + 1) * 100}") }
end
# [httpstat.us] GET /200?sleep=100 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=200 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=300 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=400 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=500 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=600 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=700 () [1642094450 -> 1642094450 = 0.001]
# [httpstat.us] GET /200?sleep=800 () [1642094450 -> 1642094450 = 0.000]
# [httpstat.us] GET /200?sleep=900 () [1642094450 -> 1642094450 = 0.000]
Notice that there is no response because these requests are added to the Hydra queue only and will be performed later.
Flawed Solution
Since the notification is published before the response is completed, we should try to delay the notification until there is a response, so we can look at an updated implementation:
def call(request_env)
@instrumenter.start(@name, request_env)
@app.call(request_env).on_complete do |response_env|
@instrumenter.finish(@name, response_env)
end
end
Let’s make the requests:
faraday.in_parallel(Typhoeus::Hydra.new(max_concurrency: 1)) do
9.times { |i| faraday.get("/200?sleep=#{(i + 1) * 100}") }
end
# [httpstat.us] GET /200?sleep=100 (200) [1642095297 -> 1642095299 = 1.593]
# [httpstat.us] GET /200?sleep=200 (200) [1642095297 -> 1642095299 = 2.053]
# [httpstat.us] GET /200?sleep=300 (200) [1642095297 -> 1642095300 = 2.726]
# [httpstat.us] GET /200?sleep=400 (200) [1642095297 -> 1642095301 = 3.406]
# [httpstat.us] GET /200?sleep=500 (200) [1642095297 -> 1642095302 = 4.356]
# [httpstat.us] GET /200?sleep=600 (200) [1642095297 -> 1642095303 = 5.369]
# [httpstat.us] GET /200?sleep=700 (200) [1642095297 -> 1642095304 = 6.334]
# [httpstat.us] GET /200?sleep=800 (200) [1642095297 -> 1642095305 = 7.431]
# [httpstat.us] GET /200?sleep=900 (200) [1642095297 -> 1642095306 = 8.577]
We have fixed the response issue, seems like there is a problem with the duration, well not exactly, because technically, we are now measuring the total time to finish the nth request since we submitted the requests.
There is a bigger problem, consider the following requests:
faraday.in_parallel(Typhoeus::Hydra.new(max_concurrency: 1)) do
9.times do |i|
faraday.get("/200?sleep=#{(i + 1) * 100}")
sleep(1)
end
end
# [httpstat.us] GET /200?sleep=100 (200) [1642095933 -> 1642095935 = 2.261]
# [httpstat.us] GET /200?sleep=200 (200) [1642095932 -> 1642095935 = 3.712]
# [httpstat.us] GET /200?sleep=300 (200) [1642095931 -> 1642095936 = 5.259]
# [httpstat.us] GET /200?sleep=400 (200) [1642095930 -> 1642095937 = 7.818]
# [httpstat.us] GET /200?sleep=500 (200) [1642095929 -> 1642095938 = 9.668]
# [httpstat.us] GET /200?sleep=600 (200) [1642095928 -> 1642095939 = 11.511]
# [httpstat.us] GET /200?sleep=700 (200) [1642095927 -> 1642095940 = 13.456]
# [httpstat.us] GET /200?sleep=800 (200) [1642095926 -> 1642095941 = 15.497]
# [httpstat.us] GET /200?sleep=900 (200) [1642095925 -> 1642095942 = 17.645]
Note that the start times are in reverse order because ActiveSupport::Notifications::Fanout maintains a time stack.
Conclusion
We cannot easily figure out the exact start time of the requests and building a system across all the adapters to get the start times would be a tedious and error-prone process. A better solution is to provide a way to access additional response information from the adapters that may contain timing and other information.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 15 (7 by maintainers)
Thanks @BuonOmo, I believe that’s the same
informations@avmnu-sng mentioned in their comment earlier, and it sure looks like a lot of information! The only missing thing there is the integration with a notification system likeActiveSupport::Notifications, but that’s something the middleware could provide.I spent some more time thinking about this, and I believe the following step-by-step approach should work. It’s mostly based on things that @avmnu-sng already raised, but my brain needed a little extra help to put them together, so I hope this list will help others see them clearly as well:
instrumentationmiddleware to use thestart/stopinterface. This could “replace” the existing block behaviour or, to be backwards-compatible, enrich it after a) checking if the instrumenter supports it; or b) check if we’re runningin_parallel. This will make things better, but will also introduce some inconsistencies as explained in the ticket description, which we can document and point outinformations😄, but it will basically contain that info. Maybeinstrumentationor something like that.instrumentationmiddleware once again so that, in theon_completecallback, it checks for this new field presence, and if found it uses that to enrich the notification payload, then it fires it.@BuonOmo @avmnu-sng what do you think about the plan above? Any feedback would be appreciated. Also, @BuonOmo, I appreciate the parallelism support have not been our main concern, mostly due to the Faraday core team inexperience on the topic, but I believe it could become a big focus now that v2.0 is out and we plan our road to v3.0. It would be really useful to have a comprehensive list of current issues so that we can explore and tackle them as necessary. May I suggest we start a new Discussion on the topic so we can coordinate there? How does that sound?