drogon: Running slow function without blocking requests

I tried googling this and searching for answers in past Issues, but I couldn’t really find my answer even tho some Issues where somewhat similar. Also, I am very new to c++ async, futures, etc. I have only ever used std::thread; so this might be a really dumb problem I’m having.

I have a drogon server running with 16 threads; my CPU is an i9 with 8 cores (macbook pro). I have a route “/test” that runs a very slow function and then returns “OK”. For now the function is just:

std::this_thread::sleep_for(std::chrono::seconds(10));

Now, if I request this route 16 times, the following happens:

1st request: 10 seconds (as expected) 2nd request: 20 seconds (seems like the first request is blocking next requests?) 3rd request: 30 seconds (same as above) 4th request: 40 seconds (same as above) 5th - 10th request: 50 seconds (suddenly, after 4 requests, 6 requests are being run at the same time) 11th - 16th request: 60 seconds (again, 6 requests at a time)

I don’t understand why the initial 4 requests run 1 request at a time. Also, once the requests run simultaneously, why is it only running 6 requests at a time when it’s supposedly using 16 threads (although just 8 cores).

I’m not sure if I’m supposed to run the “really slow function” using a new thread, using std::async with std::future or what.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 22 (9 by maintainers)

Most upvoted comments

@VladlenPopolitov I don’t quite agree. The goto statement doesn’t come with thread switching, but when a coroutine resumes its execution, it’s possible (not necessarily) to continue in another thread. When a coroutine’s sleep statement is executed, it doesn’t block the current thread. Thus, coroutines are indeed asynchronous. Therefore, IMHO, coroutines should be referred to as “flattened callbacks.” The asynchronous programming paradigm aims to reduce idle waiting of threads, enabling all threads to operate efficiently. This allows a small number of threads to handle a large volume of concurrent requests. Therefore, for compute-intensive tasks, asynchronous solutions do not offer significant advantages. For such tasks, executing them directly within the current IO thread is sufficient. If there’s concern about impacting the response latency of simple APIs, these tasks can also be placed in a thread pool for execution. Simultaneously, proper flow control should be implemented (please refer to the Hodor plugin) to prevent excessive requests from keeping the CPU under prolonged high load. Note that compute-intensive tasks do not include timers, database queries, Redis queries, requesting results from other services, and similar tasks. These tasks can improve throughput by yielding the current thread during IO waiting through callbacks or coroutines.

@ErikTheBerik trantor::ConcurrentTaskQueue look like excellent option, it is ready solution. I did test code: Declarations and definitions:

#include <trantor/utils/ConcurrentTaskQueue.h>
// define thread pool
trantor::ConcurrentTaskQueue globalThreadPool(10, "h2load test");
// declare awaitable for coroutine (my definitions to make co_await from coroutine)
using calcIntensive = std::function<void()>;
struct [[nodiscard]] ExecuteAwaiter : public CallbackAwaiter<void>
{
    explicit ExecuteAwaiter(calcIntensive func)
        : callAndResume_{std::move(func)}
    { }
    void await_suspend(std::coroutine_handle<> handle)
    {
         auto taskToQueue = [this, handle]() {
            try
            {
                this->callAndResume_();
                handle.resume();
            }
            catch (...)
            {
                auto eptr = std::current_exception();
                setException(eptr);
                handle.resume();
            }
        };
        globalThreadPool.runTaskInQueue(taskToQueue);
    };
  private:
    calcIntensive callAndResume_;
};
// functin to call lambda
struct ExecuteAwaiter executeIntensiveFunction(std::function<void()> func)
{
    struct ExecuteAwaiter retValue
    {
        std::move(func)
    };
    return retValue;
}
// function to call emtry lambda and return execution in the thread from threads pool 
struct ExecuteAwaiter switchToThreadPull()
{
    const std::function<void()> &func = []() { return; };
    struct ExecuteAwaiter retValue
    {
        std::move(func)
    };
    return retValue;
}

Controllers code - version with lambda doing intensive work, and version with switching controlling to other thread from thread pool (code from hello world example):

    Task<void> sleepHello(const HttpRequestPtr req,
                          std::function<void(const HttpResponsePtr &)> callback)
    {
        int someVariables{};
        // current thread from mainLoop returns to mainLoop, 
        // coroutine continues execution in thread from thread pool
        co_await switchToThreadPull();
        using namespace std::chrono_literals;
        // time consuming work here
        std::this_thread::sleep_for(1000ms);
        
        auto resp = HttpResponse::newHttpResponse();
        resp->setBody(
            "Hi there, this is another hello from the sleep1Hello Controller");
        callback(resp);
        co_return;
    }

    Task<void> sleep2Hello(const HttpRequestPtr req,
                          std::function<void(const HttpResponsePtr &)> callback)
    {
        int someVaribale{};
        // current thread from mainLoop returns to mainLoop
        // coroutine waits execution of lambda function in thread pool 
        // and continue execution in the same thread from thread pool.
        co_await executeIntensiveFunction([someVaribale]() {
            using namespace std::chrono_literals;
            std::this_thread::sleep_for(1000ms);
            return;
        });
        
        auto resp = HttpResponse::newHttpResponse();
        resp->setBody("Hi there, this is another hello from the sleep2Hello Controller");
        callback(resp);
        co_return;
    }

@biospb Load measurements are:

Requests per second:    4.78 [#/sec] (mean)
Time per request:       2093.243 [ms] (mean)
Time per request:       209.324 [ms] (mean, across all concurrent requests)
Transfer rate:          1.03 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        3    4   0.6      4       5
Processing:  1017 1022  15.6   1017    1067
Waiting:     1017 1022  15.4   1017    1066
Total:       1021 1027  15.1   1021    1069

Percentage of the requests served within a certain time (ms)
  50%   1021
  66%   1022
  75%   1024
  80%   1024
  90%   1069
  95%   1069
  98%   1069
  99%   1069
 100%   1069 (longest request)

Almost all code from Drogon. I did awaiter based on Drogon code too.

@biospb If you insert sleep_for in the controllers code, you will block frameworks main loop and block all controllers dependent on this thread instead of delaying the execution in your controller. If you need to make a pause in your controller for some reason, you have to use runAfter(delaySec,function) and specify delay time in seconds. Example here:

app().registerHandler(
     "/hellodelay",
     [](const HttpRequestPtr &,
        std::function<void(const HttpResponsePtr &)> &&callback) {
         // run after 1.0 sec
         drogon::app().getLoop()->runAfter(1.0, [callback]() {
             auto resp = HttpResponse::newHttpResponse();
             resp->setBody(
                 "delay and Hello, world");
             callback(resp);
         });
         return;
     },
     {Get});
 app().registerHandler(
     "/hellosleep",
     [](const HttpRequestPtr &,
        std::function<void(const HttpResponsePtr &)> &&callback) {
         using namespace std::chrono_literals;
         std::this_thread::sleep_for(1000ms); // it is not recommended - block event loop instead of pausing the controller
         auto resp = HttpResponse::newHttpResponse();
         resp->setBody("sleep and Hello, World!");
         callback(resp);
     },
     {Get});

The output for hellodelay handler load test is (ab -c 100 -n 100 ):

Requests per second:    46.02 [#/sec] (mean)
Time per request:       2173.052 [ms] (mean)
Time per request:       21.731 [ms] (mean, across all concurrent requests)
Transfer rate:          8.04 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        5    7   1.7      6      20
Processing:  1027 1120   9.4   1121    1122
Waiting:     1022 1119   9.9   1121    1122
Total:       1047 1126   8.1   1127    1128

Percentage of the requests served within a certain time (ms)
  50%   1127
  66%   1127
  75%   1128
  80%   1128
  90%   1128
  95%   1128
  98%   1128
  99%   1128
 100%   1128 (longest request)

1127 includes delay and overhead of the framework and granularity of the timer (it does not mean, that overhead is 127ms, probably delay timer has granularity 100ms.

Frankly speaking, if you need to asset the delay in frameworks, you have to run both of them with minimal user code, otherwise you evaluate the user code delays.

I hope, it solves your doubts.

Probably it can be achieved by running in separate threads instead of corotines. Also usings thread pools as it was somewhere suggested. Let’s wait for other suggestions