warp: Why warp is so slow? (compared to nginx)

Hi.

I have been testing warp and nginx with a minimal content (just a “hello world” message) and noticed warp is very slow, but I’m not sure if the slowness is related to warp or something I’m doing wrong.

The nginx config and content:

nginx -v

nginx version: nginx/1.16.1
cat /etc/nginx/nginx.conf

worker_processes auto;
worker_cpu_affinity auto;
events {
    worker_connections  10000;
}
http {
    access_log off;
    keepalive_timeout 65;
    server {
        listen 8080 default_server;
...
cat /usr/share/nginx/html/index.html
Hello World
curl -v http://localhost:8080
*   Trying ::1:8080...
* TCP_NODELAY set
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1:8080...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.65.3
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.16.1
< Date: Thu, 30 Apr 2020 22:43:42 GMT
< Content-Type: text/html
< Content-Length: 12
< Last-Modified: Thu, 30 Apr 2020 21:16:45 GMT
< Connection: keep-alive
< ETag: "5eab403d-c"
< Accept-Ranges: bytes
< 
Hello World
* Connection #0 to host localhost left intact

The warp stuff:

rustc --version
rustc 1.43.0-nightly (2890b37b8 2020-03-06)
cat examples/hello.rs 
#![deny(warnings)]
use warp::Filter;

#[tokio::main]
async fn main() {
    let routes = warp::any().map(|| "Hello World");
    warp::serve(routes).run(([0, 0, 0, 0], 8080)).await;
}
cargo build --release

sudo ./target/release/examples/hello
# using sudo just to get all kernel configurations
curl -v http://localhost:8080
*   Trying ::1:8080...
* TCP_NODELAY set
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1:8080...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.65.3
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: text/plain; charset=utf-8
< content-length: 11
< date: Thu, 30 Apr 2020 22:43:13 GMT
< 
* Connection #0 to host localhost left intact

Environment

lscpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       43 bits physical, 48 bits virtual
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               24
Model name:          AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
Stepping:            1
CPU MHz:             1295.685
CPU max MHz:         2300.0000
CPU min MHz:         1400.0000
BogoMIPS:            4591.23
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            4096K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
lsb_release -a

LSB Version: core-4.1-amd64:core-4.1-noarch
Distributor ID: Fedora
Description: Fedora release 30 (Thirty)
Release: 30
Codename: Thirty

Finally, the tests using wrk!

wrk results (avg, after three intervaled tests) for nginx

./wrk -t10 -c1000 -d10s --latency http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.48ms   16.27ms 285.61ms   90.53%
    Req/Sec     8.12k     1.59k   15.99k    78.23%
  Latency Distribution
     50%   10.07ms
     75%   16.66ms
     90%   29.94ms
     99%   72.74ms
  807992 requests in 10.10s, 190.29MB read
Requests/sec:  79976.39
Transfer/sec:     18.84MB

wrk results (avg) for warp

./wrk -t10 -c1000 -d10s --latency http://127.0.0.1:8080/
Running 10s test @ http://127.0.0.1:8080/
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    32.43ms    2.92ms  41.16ms   70.49%
    Req/Sec     3.08k   325.14     3.97k    72.20%
  Latency Distribution
     50%   32.07ms
     75%   34.20ms
     90%   36.97ms
     99%   39.35ms
  306502 requests in 10.06s, 37.41MB read
Requests/sec:  30474.16
Transfer/sec:      3.72MB

(using wrk compiled by sources on Fedora30 with Clang 8.0)

Notice warp is about three times slower than nginx.

P.S. 1.: I did more tests using ApacheBench and JMeter with five machines (clients) connected to an external server remotely (with some limits due to internet bandwidth) providing a larger content (around 150 kB), but got slower results in warp again. 😕

P.S. 2: Notice the nginx.conf on top of this message. How to apply those configs in warp? (specially worker_processes, worker_cpu_affinity and worker_connections).

Why?

I don’t know if there is any specific configuration to increase the warp speed. I would appreciate it and retest to get better results.

TIA for any help!

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 8
  • Comments: 24 (5 by maintainers)

Most upvoted comments

Hi guys. Finally, more tests done and warp got a good performance after applying the suggested configurations. 😃 However, actix is the new winner as the attached logs shows. 😅

The content used was:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Hello world benchmark</title>
  </head>
  <body>
    This is a static content to check the performance of the following HTTP
    servers:
    <ul>
      <li>actix-http</li>
      <li>deno</li>
      <li>microhttpd</li>
      <li>nginx</li>
      <li>nodejs</li>
      <li>warp</li>
    </ul>
  </body>
</html>

and the respective codes:

actix:

use std::io;

use actix_http::{HttpService, Response};
use actix_server::Server;
use futures_util::future;

#[actix_rt::main]
async fn main() -> io::Result<()> {
    Server::build()
        .bind("hello-world", "0.0.0.0:8080", || {
            HttpService::build()
                .client_timeout(1000)
                .client_disconnect(1000)
                .finish(|_req| {
                    let mut res = Response::Ok();
                    future::ok::<_, ()>(res.body(
                        r#"<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Hello world benchmark</title>
  </head>
  <body>
    This is a static content to check the performance of the following HTTP
    servers:
    <ul>
      <li>actix-http</li>
      <li>deno</li>
      <li>microhttpd</li>
      <li>nginx</li>
      <li>nodejs</li>
      <li>warp</li>
    </ul>
  </body>
</html>"#,
                    ))
                })
                .tcp()
        })?
        .run()
        .await
}

deno:

import { serve } from "https://deno.land/std/http/server.ts";
const s = serve({ port: 8080 });
console.log("http://corin.ga:8080/");
for await (const req of s) {
  req.respond({
    body: `<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Hello world benchmark</title>
  </head>
  <body>
    This is a static content to check the performance of the following HTTP
    servers:
    <ul>
      <li>actix-http</li>
      <li>deno</li>
      <li>microhttpd</li>
      <li>nginx</li>
      <li>nodejs</li>
      <li>warp</li>
    </ul>
  </body>
</html>` });
}

mhd/sagui:

#include <stdio.h>
#include <memory.h>
#include <microhttpd.h>

#define PAGE                                                                   \
  "<!DOCTYPE html>\n\
<html lang=\"en\">\n\
  <head>\n\
    <meta charset=\"UTF-8\" />\n\
    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n\
    <title>Hello world benchmark</title>\n\
  </head>\n\
  <body>\n\
    This is a static content to check the performance of the following HTTP\n\
    servers:\n\
    <ul>\n\
      <li>MHD</li>\n\
      <li>nginx</li>\n\
    </ul>\n\
  </body>\n\
</html>"

static enum MHD_Result ahc_echo(void *cls, struct MHD_Connection *con,
                                const char *url, const char *method,
                                const char *version, const char *upload_data,
                                size_t *upload_data_size, void **ptr) {
  struct MHD_Response *res;
  enum MHD_Result ret;
  if ((void *)1 != *ptr) {
    *ptr = (void *)1;
    return MHD_YES;
  }
  *ptr = NULL;
  res = MHD_create_response_from_buffer(strlen(PAGE), PAGE,
                                        MHD_RESPMEM_PERSISTENT);
  ret = MHD_queue_response(con, MHD_HTTP_OK, res);
  MHD_destroy_response(res);
  return ret;
}

int main() {
  struct MHD_Daemon *d;
  d = MHD_start_daemon(
      MHD_USE_EPOLL_INTERNAL_THREAD | MHD_SUPPRESS_DATE_NO_CLOCK |
          MHD_USE_EPOLL_TURBO,
      8080, NULL, NULL, &ahc_echo, NULL, MHD_OPTION_CONNECTION_TIMEOUT,
      (unsigned int)120, MHD_OPTION_THREAD_POOL_SIZE,
      (unsigned int)sysconf(_SC_NPROCESSORS_ONLN), MHD_OPTION_CONNECTION_LIMIT,
      (unsigned int)10000, MHD_OPTION_END);
  getchar();
  MHD_stop_daemon(d);
  return 0;
}

nginx: (using index.html with the content)

worker_processes auto;
worker_cpu_affinity auto;
events {
    worker_connections  10000;
}
http {
    access_log off;
    keepalive_timeout 65;
    server {
        listen 8080 default_server;
...

nodejs:

const http = require("http");

const hostname = "0.0.0.0";
const port = 8080;

const server = http.createServer((req, res) => {
  res.statusCode = 200;
  res.setHeader("Content-Type", "text/plain");
  res.end(`<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Hello world benchmark</title>
  </head>
  <body>
    This is a static content to check the performance of the following HTTP
    servers:
    <ul>
      <li>actix-http</li>
      <li>deno</li>
      <li>nginx</li>
      <li>nodejs</li>
      <li>sagui</li>
      <li>warp</li>
    </ul>
  </body>
</html>`);
});

server.listen(port, hostname, () => {
  console.log(`Server running at http://${hostname}:${port}/`);
});

warp:

#![deny(warnings)]
use warp::Filter;

#[tokio::main(max_threads = 10_000)]
async fn main() {
    let routes = warp::any().map(|| r#"<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Hello world benchmark</title>
  </head>
  <body>
    This is a static content to check the performance of the following HTTP
    servers:
    <ul>
      <li>actix-http</li>
      <li>deno</li>
      <li>microhttpd</li>
      <li>nginx</li>
      <li>nodejs</li>
      <li>warp</li>
    </ul>
  </body>
</html>"#);
    warp::serve(routes).run(([0, 0, 0, 0], 8080)).await;
}

the runner was:

#!/bin/sh

set -e

wrk -t10 -c1000 -d10s --latency http://corin.ga:8080/ > "wrk-$1.log"

All logs attached bellow:

wrk-actix.log wrk-deno.log wrk-mhd.log wrk-nginx.log wrk-node.log wrk-warp.log


Machine:

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           24
Model name:                      AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
Stepping:                        1
Frequency boost:                 enabled
CPU MHz:                         1130.695
CPU max MHz:                     2300.0000
CPU min MHz:                     1400.0000
BogoMIPS:                        4591.63
Virtualization:                  AMD-V
L1d cache:                       128 KiB
L1i cache:                       256 KiB
L2 cache:                        2 MiB
L3 cache:                        4 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v
                                 ia prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user
                                  pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditiona
                                 l, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
                                 r pge mca cmov pat pse36 clflush mmx fxsr sse s
                                 se2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtsc
                                 p lm constant_tsc rep_good nopl nonstop_tsc cpu
                                 id extd_apicid aperfmperf pni pclmulqdq monitor
                                  ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes 
                                 xsave avx f16c rdrand lahf_lm cmp_legacy svm ex
                                 tapic cr8_legacy abm sse4a misalignsse 3dnowpre
                                 fetch osvw skinit wdt tce topoext perfctr_core 
                                 perfctr_nb bpext perfctr_llc mwaitx cpb hw_psta
                                 te sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2
                                  smep bmi2 rdseed adx smap clflushopt sha_ni xs
                                 aveopt xsavec xgetbv1 xsaves clzero irperf xsav
                                 eerptr arat npt lbrv svm_lock nrip_save tsc_sca
                                 le vmcb_clean flushbyasid decodeassists pausefi
                                 lter pfthreshold avic v_vmsave_vmload vgif over
                                 flow_recov succor smca

OS:

$ cat /etc/fedora-release
Fedora release 32 (Thirty Two)

testing locally now, I get about 2x the performance of nginx with warp and tokio 1.0

Why? Because Nginx is battle optimized to the teeth.

That said, some possible follow ups:

  1. Is it hyper? warp is on top of hyper, so if hyper is slow then we need to improve hyper.
  2. dtrace + flamegraph can show which function is the hottest

To answer your P.S. 2:

  1. worker_processes has no correspondence as tokio doesn’t do multi-process.
  2. worker_cpu_affinity is not a concern, because the tokio scheduler always pins a thread to a CPU. The way tokio distributes work across threads is work-stealing.
  3. worker_connections may be max_threads

You can configure tokio by:

#[tokio::main(max_threads = 10_000)]
async fn main() {
    println!("Hello world");
}

Maybe you’ve already done this, but try adding the full feature flag to the tokio dependency, as follows:

tokio = { version = "0.2", features = ["full"] }

to your Cargo.toml. Without it, tokio will not use multiple threads IIRC. It massively increased my performance.

For me Actix had a massive memory leak issue. And after trying to solve it, without any success, joined to the warp fun club.

And after trying to solve it, without any success, joined to the warp fun club.

now, actix-web is released 3.0, it’s solved memory leak.

@aslamplr it would be awesome to have warp in the benchmarks as well. Probably a new feature issue for?

I think a lot of people hit this because the example in warp’s README only enables the macros feature of tokio, which means you end up with a single-threaded runtime. Almost everyone probably wants to also enable the rt-threaded feature (or just go for full as above).

warp-rust is missing in the benchmarks!

It’s there. Check the other test types.

Hi guys, thanks for answering. I’m going to prepare a new environment for testing and apply all suggested configurations. In the new tests I’ll add other projects too, like Actix, Rocket etc., and back with any feedback soon …

@joseluisq, you solved the problem! 👏👏👏

Now I get the same benchmarking results with warp/actix.

Sent as new PR: https://github.com/seanmonstar/warp/pull/786.

Thank you very much! 🙂

@joseluisq: I didn’t know this project existed, I’m going to test it. Thank you! 😃

@apiraino:

Rest APIs? Websocket?

Only the raw HTTP server.

Rust tool to run benchmarks?

Because I would like to use Rust instead of Go, since the first one is more optimized.

raw HTTP calls can also be done with ab.

The wrk tool uses less memory than ab in massive tests.

P.S.: I would like to use any Rust tool instead of go-wrk, but I can’t find it. 🙁

@silvioprog Did you try this one https://github.com/tsenart/vegeta?

now, actix-web is released 3.0, it’s solved memory leak.

The major bumping up https://github.com/actix/actix-web/issues/1554