actix-web: Actix-web hangs forever if request processing takes a long time.

The code (for actix-web-0.7, I didn’t try reproducing on 1.0 yet):

use actix_web::{HttpRequest, HttpResponse};

use std::sync::{Arc, Mutex};

struct Unit;
struct SlowStuff {}

impl SlowStuff {
    pub fn new() -> SlowStuff {
        // some delay here seems crucial, since I couldn't reproduce without it
        // it does not need to be thread::sleep, but can be
        // e.g. reading and parsing a big file.
        std::thread::sleep(std::time::Duration::from_secs(2));
        SlowStuff {}
    }
}

struct State {
    unit: Arc<Mutex<Unit>>,
    // wrapping the slow stuff into Arc<Mutex<>> can be done but doesn't matter
    slow_stuff: SlowStuff,
}

fn index(req: &HttpRequest<State>) -> HttpResponse {
    println!("got request to: {:?}", req.uri());

    {
        // Taking the lock is moderately important.
        // I think I've reproduces this several times without taking the
        // lock but it's very inconsistent.
        let unit: &Unit = &req.state().unit.lock().unwrap();
        // this delay is important. Couldn't reproduce without
        // it does not need to be thread::sleep, but can be
        // e.g. reading and parsing a big file.
        std::thread::sleep(std::time::Duration::from_millis(1000));
        println!("got unit: {:?}", unit as *const Unit);
    }

    let res = format!("hello");
    let resp = HttpResponse::Ok().body(res);
    println!("made a response");

    resp
}

fn main() {
    actix_web::server::new(move || {
        actix_web::App::with_state(State {
            unit: Arc::new(Mutex::new(Unit)),
            slow_stuff: SlowStuff::new(),
        })
        .resource("/", |r| r.f(index))
    })
    // worker count does not matter, but I assume 1 is easier to debug?
    .workers(1)
    .bind("127.0.0.1:8081")
    .unwrap()
    .run();
}

Go to http://localhost:80801/, see the page showing fine. If you refresh the page several times (it might need several refreshes in quick succession? unsure… I just tap F5 real quick), the page would get into an eternal “loading” state, further attempts to access the endpoint (also from other browser tabs) would never give response.

Stacktrace as reported by ProcessExplorer in the frozen state (identical for both the main thread, and the two others):

ntoskrnl.exe!KiCpuId+0xaa
ntoskrnl.exe!ExAcquireSpinLockExclusiveAtDpcLevel+0x2c2
ntoskrnl.exe!KeWaitForSingleObject+0x1a3
ntoskrnl.exe!ExReleaseResourceForThreadLite+0x60
ntoskrnl.exe!KeInsertQueueApc+0x725
ntoskrnl.exe!ExAcquireSpinLockExclusiveAtDpcLevel+0x4cd
ntoskrnl.exe!KeRemoveQueueEx+0x323
ntoskrnl.exe!PsAssignImpersonationToken+0x13c
ntoskrnl.exe!RtlPinAtomInAtomTable+0x1338
ntoskrnl.exe!longjmp+0x5b93
ntdll.dll!ZwRemoveIoCompletionEx+0xa
KERNELBASE.dll!GetQueuedCompletionStatusEx+0xdf
kernel32.dll!GetQueuedCompletionStatusEx+0x19
actix-hang.exe!miow::iocp::CompletionPort::get_many+0x215

Windows 7, firefox browser.

P.s.: I cannot reproduce it on linux, so probably this is windows-specific…

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 21 (7 by maintainers)

Most upvoted comments

I have delved into this a bit. The callstack is the following: actix_server::accept::Accept::process_cmd https://github.com/actix/actix-net/blob/server-v0.5.1/actix-server/src/accept.rs#L303 mio::poll::Poll::register https://github.com/tokio-rs/mio/blob/v0.6.19/src/poll.rs#L790 mio::sys::windows::selector::ReadyBinding::register_socket https://github.com/tokio-rs/mio/blob/v0.6.19/src/sys/windows/selector.rs#L361 mio::sys::windows::selector::Binding::register_socket https://github.com/tokio-rs/mio/blob/v0.6.19/src/sys/windows/selector.rs#L200 miow::iocp::CompletionPort::add_socket https://github.com/alexcrichton/miow/blob/0.2.1/src/iocp.rs#L74 miow::iocp::CompletionPort::_add https://github.com/alexcrichton/miow/blob/0.2.1/src/iocp.rs#L80 winapi::CreateIoCompletionPort

This is not exposed in the docs for some reason, but there is this interesting comment inside mio::register:

        /*
         * Undefined behavior:
         * - Reusing a token with a different `Evented` without deregistering
         * (or closing) the original `Evented`.
         */

Following this all way down to CreateIoCompletionPort MSDN entry, we see it mention, in passing: This completion key should be unique for each file handle, and it accompanies the file handle throughout the internal completion queuing process. And, evidently, violating this requirement gets is treated as passing in incorrect parameter, which is fair, I guess.

When I add diagnostic print, I see, that CreateIoCompletionPort is indeed called with the same non-null parameters multiple times. And this makes sense, since in the above trace logs you can see (The number in (brackets) is said to be thread number):

06:52:28 [TRACE] (6) mio::sys::windows::selector: [C:\Users\moxian\.cargo\registry\src\github.com-1ecc6299db9ec823\mio-0.6.19\src\sys\windows\selector.rs:359] register Token(100) Readable
<...>
06:52:28 [TRACE] (12) mio::sys::windows::selector: [C:\Users\moxian\.cargo\registry\src\github.com-1ecc6299db9ec823\mio-0.6.19\src\sys\windows\selector.rs:359] register Token(100) Readable
<...>
06:52:30 [TRACE] (12) mio::sys::windows::selector: [C:\Users\moxian\.cargo\registry\src\github.com-1ecc6299db9ec823\mio-0.6.19\src\sys\windows\selector.rs:359] register Token(100) Readable
06:52:30 [ERROR] Can not resume socket accept process: The parameter is incorrect. (os error 87)
<...>

Why is token the same? I’m not sure… I tried reading actix-server code but didn’t get far, unfortunately.

Hope this helps!..

(This is probably not needed, but just in case: another Trace log with a tiny bit more logging (and slightly different line numbers as a result): listener-1561709109-trace.log )

@moxian thanks for investigation, that is helpful. i’ll try to come up with some fixes

I was able to see this issue, using React useEffect without adding a return [] in the effect method, I have a Rust/Actix app, in my routes

.route(
        "/api/v1/users{_:/?}",
        web::post().to(users_controller::index),
 )

all what my user controller is doing is somehow calling reqwest (this is some sort of proxy http call)

   .post("http://someotherwebservice:3333/api/legacy_users")
   .json(&request_body)
   .send()
   .await
   .expect("error")
   .json()
   .await
   .expect("error");

When i call this from the React client: this responds successfully the first time with data, but then second time it completely hangs Actix

React.useEffect(() => {
       await fetch('http://localhost_actix_port/api/v1/users');
});

But when adding [] to the React.useEffect it doesn’t block the backend. This is a strange behavior because basically any app can completely create a DDoS attach to any backend that uses an endpoint that proxy http-requests in this way

React.useEffect(() => {
       await fetch('http://localhost_actix_port/api/v1/users');
}, []); <- this avoids an infinite loop

actix-cors = “0.2.0” actix-web = “2” actix-service = “1.0.5” actix-rt = “1” reqwest = { version = “0.10”, features = [“blocking”, “json”] }

I can create an example if necessary with that part of the app.