fastapi: Very poor performance does not align with marketing

I wanted to check the temperature of this project and so I ran a quick, very simple, benchmark with wrk and the default example:

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
def read_root():
    return {"Hello": "World"}

Everything default with wrk, regular Ubuntu Linux, Python 3.8.2, latest FastAPI as of now.

wrk http://localhost:8000

Uvicorn with logging disabled (obviously), as per the README:

python3 -m uvicorn fast:app --log-level critical

I get very poor performance, way worse than Node.js and really, really far from Golang:

Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.83ms  365.59us   3.90ms   75.34%
    Req/Sec     2.74k   116.21     2.98k    65.00%
  54447 requests in 10.00s, 7.37MB read
Requests/sec:   5442.89
Transfer/sec:    754.78KB

This machine can do 400k req/sec on one single thread using other software, so 5k is not at all fast. Even Node.js does 20-30k on this machine, so this does not align at all with the README:

The key features are:

Fast: Very high performance, on par with NodeJS and Go (thanks to Starlette and Pydantic). One of the fastest Python frameworks available.

Where do you post benchmarks? How did you come to that conclusion? I cannot see you have posted any benchmarks at all?

Please fix marketing, it is not at all true.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 60
  • Comments: 37 (6 by maintainers)

Most upvoted comments

There seem to be two intertwined discussions here that I think we can address separately.

The NodeJS and Go comparison

There is definitely contention around the phrase “on par with NodeJS and Go” in the documentation. I believe the purpose of that phrase was to be encouraging so that people will try out the framework for their purpose instead of just assuming “it’s Python, it’ll be too slow”. However, clearly the phrase can also spawn anger and be off-putting which would be the opposite of what we’re trying to achieve here.

I believe if the comparison is causing bad feelings toward FastAPI that it should simply be removed. We can claim FastAPI is fast without specifically calling out other languages (which almost always leads to defensiveness). Obviously this is up to @tiangolo and we’ll need his input here when he gets to this issue.

FastAPI’s Performance

If you ask “is it fast” about anything, there will be evidence both for and against. I think the point of linking to TechEmpower instead of listing numbers directly is so that people can explore on their own and see if FastAPI makes sense for their workloads. However, we may be able to do a better job of guiding people about what is “fast” about FastAPI.

For the numbers I’m about to share, I’m using TechEmpowers “Round 19” looking at only the “micro” classification (which is what FastAPI falls under) for “go”, “javascript”, “typescript”, and “python”. I don’t use Go or NodeJS in production, so I’m picking some popular frameworks which appear under this micro category to compare: “ExpressJS” (javascript), “NestJS” (typescript), and “Gin” (golang). I don’t know how their feature sets compare to FastAPI.

Plain Text

I believe this is what most of the comparisons above me are using. FastAPI is much slower than nest/express which is much slower than Gin. Exactly what people are saying above. If your primary workload is serving text, go with Go.

Data Updates

Requests must fetch data from a database, update, and commit it back, then serialize and return the result to the caller. Here FastAPI is much faster than NestJS/Express which are much faster than Gin.

Fortunes

This test uses an ORM and HTML templating. Here all the frameworks are very close to each other but, in order from fastest to slowest, were Gin, NestJS, FastAPI, Express.

Multiple Queries

This is just fetching multiple rows from the database and serializing the results. Here, FastAPI slightly edges out Gin. Express and NestJS are much slower in this test.

Single query

Single row is fetched and serialized. Gin is much faster than the rest which are, in order, FastAPI, NestJS, and Express.

JSON serialization

No database activity, just serializing some JSON. Gin blows away the competition. Express, then Nest, then FastAPI follow.

So the general theme of all the tests combined seems to be if you’re working with large amounts of data from the database, FastAPI is the fastest of the bunch. The less database activity (I/O bound), the further FastAPI falls and Gin rises. The real takeaway here is that the answer to “is it fast” is always “it depends”. However, we can probably do a better job of pointing out FastAPI’s database strengths in the sections talking about speed.

@alexhultman If you are not happy about the different DB choices of TechEmpower, you can probably raise an issue there (e.g. https://github.com/TechEmpower/FrameworkBenchmarks/issues/2845 - that repo is open to contributions), or pick another comprehensive benchmark you prefer, which we can all benefit from when choosing a framework.

Also please be reminded that so far everyone replying to you in this thread is community member only; we are not maintainers of fastapi. If you want to know who wrote that claim, please use git blame. Please be kind to people who are trying to have a discussion here.

Okay, so taking the source you gave me (entirely disregarding my own test), I can read the following:

268 fastapi 159,445 2.2%
199 uvicorn 382,930 5.2%
124 nodejs 884,444 12.0%
27 fasthttp 5,962,266 81.2%

Which is in very stark contrast with the README:

Very high performance, on par with NodeJS and Go

https://www.collinsdictionary.com/dictionary/english/on-a-par-with

2.2% is not “on par with” 12%. It’s like comparing wine with light beer - they are entirely disjoint, you cannot possibly claim light beer gets you as hammered as wine?

And the golang thing… jeeeez!

if you’re working with large amounts of data from the database, FastAPI is the fastest of the bunch […] we can probably do a better job of pointing out FastAPI’s database strengths in the sections talking about speed.

Here is a short lesson in critical thinking:

  • The top 65 entires in “Data Updates” are PostgreSQL, so its fair to say PostgreSQL is top in that particular test.
  • The top 125 entires in “Data Updates” are either PostgreSQL or MongoDB.
  • The FastAPI entry in “Data Updates” happens to use PostgreSQL in that particular test.
  • The Gin entry happens to use MySQL, a database that is first seen at position 126.
  • FastAPI is seen at position 77.

But yes, I guess we should attribute this victory to FastAPI. Because the fact it used PostgreSQL in a test that clearly favors PostgreSQL has nothing at all to do with the outcome. Nothing at all 🎶 😉

And the fact FastAPI scores last in every single test that does not involve the variability of database selection, that is just random coincidence. 🎵 🎹

I must agree with @alexhultman that the performance claims are misleading …I learned it the hard way too. Performance is not really what it claims to be.

Taking another example, the one just serving a chunk of text:

  • 5962k req/sec for fasthttp
  • 884k req/sec for NodeJS
  • 159k req/sec for FastAPI

To boldly state as the first feature “Fast: Very high performance, on par with NodeJS and Go” is well… I guess I don’t have to say it. …It leads to disappointments down the road when you discover the truth.

Probably it would be better to just keep “Among the fastest Python frameworks available” and emphasize on the other good features.

@alexhultman Your point might be valid, but I think you might be oversimplifying your tests here. Benchmarks are a tricky thing, but it’s important to know what is it that you are comparing.

FastApi is a Web application framework that provides quite a bit over just an application server.

So if you are comparing FastAPI, say, to NodeJs, then the test should be done over a Web Application Framework as NestJS or similar.

Same thing with Golang. the comparison should be against Revel or something like this.

In @tiangolo’s documentation on benchmarks you can read:

If you didn’t use FastAPI and used Starlette directly (or another tool, like Sanic, Flask, Responder, etc) you would have to implement all the data validation and serialization yourself. So, your final application would still have the same overhead as if it was built using FastAPI. And in many cases, this data validation and serialization is the biggest amount of code written in applications.

https://fastapi.tiangolo.com/benchmarks/

I believe that when the developers say:

Very high performance, on par with NodeJS and Go

They mean a full application on Golang or NodeJS (on some framework) vs a Full application on FastAPI.

I don’t know how you did the benchmarks, but from TechEmpower benchmarks, this is the result.

In a real world scenario like Data Updates, 1-20 Queries, etc. FastAPI is much faster.

Framework JSON 1-query 20-query Fortunes Updates Plaintext
fastapi 171,055 66,185 13,022 52,080 5,926 159,445
express 246,627 57,588 4,261 44,166 2,075 369,533

Ok, I came a bit futher than a “hello world” application and conclusions here are correct. Performance cannot be even compared with Node.js or .NET. It’s lower, it’s much slower. But to say honestly I think it’s a python problem, not a framework by itself.

I think this issue has gone as deep as it goes already, nothing can be said that hasn’t already been. Alright, thank you and have a nice day.

For the caching: if you benchmark with a cache solution enabled: then you are testing you cache strategy rather than what fastapi can handle.

I just ran all-defaults comparison FastAPI vs ExpressJS:

$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.80ms    0.86ms  21.87ms   94.90%
    Req/Sec     1.33k   145.34     1.44k    90.00%
  26495 requests in 10.01s, 3.59MB read
Requests/sec:   2647.61
Transfer/sec:    367.16KB

$ wrk http://localhost:3000
Running 10s test @ http://localhost:3000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.43ms  567.31us  16.51ms   91.17%
    Req/Sec     3.58k   583.68     4.10k    88.00%
  71280 requests in 10.00s, 16.25MB read
Requests/sec:   7125.65
Transfer/sec:      1.62MB

I love the syntax and ease of use of FastAPI, but it’s disappointing to see misleading claims about its speed. 367kb/s is NOT “on par” with 1620kb/s. that’s 400% higher throughput than "Fast"Api

but it is about twice as fast as Flask:

$ wrk http://localhost:5000
Running 10s test @ http://localhost:5000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.47ms    2.38ms  27.67ms   82.41%
    Req/Sec   580.47    142.46   848.00     58.50%
  11568 requests in 10.01s, 1.82MB read
Requests/sec:   1155.08
Transfer/sec:    186.12KB

To be honest it’s disappointing that @tiangolo or anyone from the fastapi team is not commenting on this. They still have “on par with Nodejs and Golang” on the homepage of their website. If there is a use case where fastapi is on par with Golang and Nodejs that they’re aware of they should share since that would be very useful information. It’s sad because fastapi is a really fun to work with framework and it has a lot of merits, but the potential dishonesty in their marketing is disheartening

FastAPI has a single maintainer, and there were comments on this issue from active users (at the time they posted at least). If you consider those active users as team, then we have replied.

In any case, @tiangolo will see this at some point.

I acknowledge that this thread is closed however, I wanted to add some extra information to aid in this comparison. An important detail I think a lot of these benchmarks miss is proper configuration of your libraries when using FastAPI. So without trying to sound too overly opinionated here are a couple things that I hope provide a better comparison…

  1. Using the --workers N flag with FastAPI and not using concurrency with Express/Node is NOT apples-to-apples. The simplest approach is to just compare with a single process each.
  2. Proper configuration of the underlying libraries used with FastAPI is crucial to increased performance. When used with uvicorn[standard] which leverages uvloop and httptools,nodejs http parser written in C), JSON performance of FastAPI is on par with Express/Node for an equal number of processes.
  3. And here is the opinionated part… You shouldn’t be choosing between Python and Node because of performance. As a generality you should choose the language that makes the most sense or which allows you to higher the right developers and build the fastest. To the end that a particular language makes the most sense… Are you building and analytics/data/ML project? You are highly likely choosing Python over Node regardless of performance. Are you writing a Full Stack GraphQL app that spends almost all of its time waiting on a database to return data? Then you are probably writing that in Node.

Okay onto the benchmarks… All of this was run on a M1 Max MBP

Python

Install the libraries

pip install numpy fastapi 'uvicorn[standard]' orjson
# app.py

from fastapi import FastAPI
from fastapi.responses import ORJSONResponse
import numpy as np

app = FastAPI(default_response_class=ORJSONResponse)

@app.get("/json")
async def json_test():
    return {"message": "Hello World"}

@app.get("/async/math/numpy")
async def async_math_test(length: int = 1000):
    return {"sum": int(np.sum(np.arange(1, length + 1) ** 2))}

@app.get("/sync/math/numpy")
def sync_math_test(length: int = 1000):
    return {"sum": int(np.sum(np.arange(1, length + 1) ** 2))}

@app.get("/async/math/python")
async def async_math_test(length: int = 1000):
    return {"sum": sum((i ** 2 for i in range(1, length + 1)))}

@app.get("/sync/math/python")
def sync_math_test(length: int = 1000):
    return {"sum": sum((i ** 2 for i in range(1, length + 1)))}

Run the app

uvicorn app:app --port 8000 --log-level critical

Express/NodeJS

Install the libraries

yarn add express lodash

Note that I used Typescript so you will want to add the @types as well if you feel like it. I used loadsh for the math because it’s array functions outperform the native implementations in my testing.

// app.ts

import express from 'express';
import _ from 'lodash';

const app = express()
const port = 3000

app.get('/json', (req, res) => {
  res.send({message: 'Hello World'})
})

app.get('/math', (req, res) => {
  const length = req.query.length ? parseInt(req.query.length as string) : 1000
  res.send({sum: _.reduce(_.range(0, length), (p, c) => p + c ** 2)})
})


app.listen(port, () => {
  console.log(`Example app listening on port ${port}`)
})

Run the app

ts-node app.ts

Results

JSON

Python

❯ wrk -c 128 -t 20  'http://localhost:8000/json'
Running 10s test @ http://localhost:8000/json
  20 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.39ms    1.85ms  21.03ms   72.43%
    Req/Sec     0.94k   159.70     1.81k    69.33%
  187586 requests in 10.10s, 26.83MB read
Requests/sec:  18570.58
Transfer/sec:      2.66MB

Express/NodeJS

❯ wrk -c 128 -t 20  'http://localhost:3000/json'
Running 10s test @ http://localhost:3000/json
  20 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.85ms    1.03ms  26.99ms   90.99%
    Req/Sec     1.03k    85.65     1.57k    91.35%
  205779 requests in 10.02s, 51.02MB read
Requests/sec:  20531.14
Transfer/sec:      5.09MB

Math

When performing mathematical operations we have a few options… With python it is popular to use numpy for fast array math. All results are in req/sec for a given array length calculated. The differentiation between sync/async endpoints for python is certainly interesting and challenges the notion that using the sync endpoints with CPU bound code helps performance. This is especially the case when using numpy. Noticeably, the pure python approach has horrednous performance and should generally be avoided but, we already knew that. As for the numpy vs node comparison they are very close at the small/medium array size and then numpy starts takes the lead to the tune of 3x as many req/sec.

length sync_python sync_numpy async_python async_numpy node
1k 3135.64 7769.69 3835.22 18144.57 18563.20
10k 446.75 4807.13 462.21 13585.08 12099.67
100k 43.40 3049.65 36.03 6756.62 2327.96
1000k 2.77 900.41 0.59 620.19 273.55

Database

Didn’t so anything here but, I have seen that performance of asyncpg vs node’s pg library tends to favor asyncpg from a performance standpoint but, I haven’t run this test yet.

Just a different point of view on the performance-between-languages “issue”:

Especially for web development, I prefer to have a prototype up and running, all the business (or “fun”) logic implemented & all of that implemented through a delight to read code & then try scaling (docker & kubernetes) or improving bottlenecks (celery & rabbitmq) or even implementing with as minimal codebase as possible, microservices written in other languages, rather than spending months in any other language (as basis) that its syntax is over-verbose, its libraries more often than not unmaintained/malware injected etc, its tooling immature or unpromising & its whole communities confused, unhelpful or disoriented with “performance”, “super-secure”, “we-are-the-future” complexes.

At the end of the day a couple of containers with python (any framework) at backend & node (any framework) for client will do the trick for performance as well.

p.s. But yeah, the official claim is overblown & over simplistic. Shame for such a noteworthy & well composed framework.

I don’t know how you did the benchmarks, but from TechEmpower benchmarks, this is the result.

In a real world scenario like Data Updates, 1-20 Queries, etc. FastAPI is much faster.

Framework JSON 1-query 20-query Fortunes Updates Plaintext fastapi 171,055 66,185 13,022 52,080 5,926 159,445 express 246,627 57,588 4,261 44,166 2,075 369,533

i did exactly what @alexhultman did. created a “hello world” application in both fastapi and expressjs - using all defaults. I didn’t optimize anything. then ran wrk commands as shown in my comment.

I also ran a bare uvicorn server (with hello world app):

$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.70ms  246.37us   6.30ms   87.68%
    Req/Sec     1.86k    58.33     1.96k    81.50%
  37026 requests in 10.00s, 5.30MB read
Requests/sec:   3701.70
Transfer/sec:    542.27KB

and already it’s slower than expressjs

I saw people pointing out database tests. This is important because you will use DBs etc, but it isn’t testing the WebFramework itself! It is comparing node.js pg vs python asyncpg or node.js json vs python json/orjson/ujson etc.

FastAPI is not a server, so you can only measure the overhead over raw ASGI and measure the ASGI Server performance.

For measuring overhead, TechEmPower plaintext is actually a cool tool. so let’s do it!

First let’s see how some frameworks with common (uvicorn and meinheld) servers in general, are compared.

image

So in raw throughput gunicorn+uvicorn is not doing great, and fastiapi using it will not be able to catch up with go frameworks like Gin or Fiber, or even with express and fastify node.js frameworks. Only socketify.py with PyPy pass the throughput of Fiber in this case. So I made an ASGI server (still in development) and WSGI server using socketify (aka uWS C++) as base.

image

So testing ASGI servers and WebFrameworks, Emmett and Falcon using uvicorn are faster than FastAPI, and faster than node.js express, remember node.js express is really slow.

Using socketify ASGI FastAPI is able to catch up with Emmett and Falcon but If Emmett and Falcon uses socketify ASGI too, they will be faster of course. But still not at the same level of performance as Fastify in node.js for example.

The raw ASGI test of socketify with CPython is faster than Fastify but the overhead of WebFrameworks just put the performance down, but even raw-asgi in CPython is not able to catch up with Gin golang framework (with is not fast for golang standards)

Using the same server as the base you can see that Falcon has less overhead than Emmett and Emmett has less overhead than FastAPI.

Socketify.py is not optimized for CPython yet but is optimized for PyPy, and you can see that using PyPy FastAPI, Emmett and Falcon passes Gin, but not even close to Fiber performance! That’s because the ASGI server it self is slower than Fiber!

So the claim:

Very high performance, on par with NodeJS and Go (thanks to Starlette and Pydantic).

Is just nonsense, because FastAPI is not a Server, Uvicorn Server is slower than node.js express, fastify, gin, and fiber SERVERS! Socketify.py ASGI Server is faster than express, fastify, gin servers but slower than Fiber server.

The question must always be using X server with FastAPI overhead on top, can it be on par with NodeJS and Go? The answer is Maybe because, uvicorn is slow compared to it, and socketify ASGI it’s on pair with express, fastify, or gin, but it is not on pair with fasthttp, fiber or even with socketify.py itself without using ASGI!

ASGI just has a lot of overhead! and on top of it, FastAPI has a lot of overhead too!

You can claim that Falcon/Emmett is faster than FastAPI, and you can also claim that socketify.py is faster than some node.js servers and golang servers, but can not claim that FastAPI is faster than node.js or go because is comparing apples to oranges.

Can FastAPI claim be fast? Yes! but need to be compared with WebFrameworks, not Servers! Can FastAPI claim to be the fastest Python WebFramework in all scenarios? No, because others WebFrameworks like Emmett and Falcon have less overhead.

ASGI WebFrameworks in general needs to reduce overhead, most of the overhead is dynamic allocations and the asyncio event loop itself, and it can be mitigated a LOT using Factories to reuse some Tasks/Futures and Request/Response objects in general instead of GC them every time, and using PyPy to be able to use stack allocations.

Raw throughput is very important for any big application because most production codes should use in-memory caching for responses. That’s why with socketify.py you can use sync route to check your in-memory cache and use res.run_async(coro) to go async only if you need to get the data itself. And that is why I will develop a Caching tool to not even touch Python or Python GIL and just send the response from C++ world when cached.

You can claim:

High Performance, faster than Express.js, Fastify.js Node.JS Frameworks, and Gin Golang Framework (thanks to Socketify.py and PyPy).

It’s not “on pair” because it’s faster with PyPy or slower with CPython.

If you want to see some WSGI numbers to compare overhead here is a chart:

image

WSGI has less overhead because it’s not using asyncio and because is not doing a lot of unnecessary dynamic allocations, in fact, Socketify.py WSGI does more coping and work than ASGI because it uses ASGI native API and covert the headers before calling the app itself, most of the slowdown is asyncio event loop overhead.

Django is just slow, so slow that even with socketify it can’t be faster than express, and only when PyPy optimizes and reduces the overhead of Django that it can be faster than express.

Falcon and Flask can be faster than Gin with PyPy too, but take a closer look, Falcon + WSGI have really little overhead over raw-wsgi, falcon + ASGI have a way bigger overhead over raw-asgi.

Of course, it is not faster than fasthttp or fiber golang, but at least you can say that is faster than SOME golang frameworks.

And remember PyPy it’s not only for compute-heavy workloads, but is also GREAT to remove unnecessary overhead.

socketify.py project: https://github.com/cirospaciari/socketify.py how to run ASGI/WSGI with socketify: https://docs.socketify.dev/cli.html tfb tools and benchs: https://github.com/TechEmpower/FrameworkBenchmarks

@introom if using ‘async’, router will use await func(). else router will use `await run_in_threadpool(func)

I guess the mystery is solved (kind of)!

FastAPI is even faster than NodeJS (even with a single worker). You just need to make the method async. Though I still don’t understand why that’s the case when there’s no io?

Here is the ab command for benchmarking:

ab -n 10000 -c 1000 http://localhost:8000/test

Here is the version without async keyword:

@app.get("/test")
def test():
    return {"message": "Hello World"}

Result: Requests per second: 2596.19 [#/sec] (mean)


And with async:

@app.get("/test")
async def test():
    return {"message": "Hello World"}

Result: Requests per second: 4902.09 [#/sec] (mean)


Express:

const app = express()
const port = 3000

app.get('/', (req, res) => {
  res.json({'ok': true})
})

Result: Requests per second: 4545.43 [#/sec] (mean)


All the above tests are done with a single worker. As a side note, just by adding --workers N without changing any code, FastAPI provides significantly higher performance (in my case 8 workers gave me 7698.08 [#/sec]!)

This is unbelievable! Did I miss anything?

This comment explain it: https://github.com/tiangolo/fastapi/issues/1664#issuecomment-653580642

I really dont understand why people keep arguing here…

So the general theme of all the tests combined seems to be if you’re working with large amounts of data from the database, FastAPI is the fastest of the bunch. The less database activity (I/O bound), the further FastAPI falls and Gin rises. The real takeaway here is that the answer to “is it fast” is always “it depends”. However, we can probably do a better job of pointing out FastAPI’s database strengths in the sections talking about speed.

@andreixk please check the benchmarks that i sent, if you think it is inaccurate please open an issue in TechEmpower’s GitHub Repository

btw nice article https://www.travisluong.com/fastapi-vs-fastify-vs-spring-boot-vs-gin-benchmark/ when you put performant async libraries and correctly use fastapi to take advantage of the cores it seems on par with node/express which makes sense

@mjsampson saying their marketing is dishonest is a bit harsh. Marketing tends to emphasize one’s strengths and benefits to encourage patronage of whatever one is offering. Based on a subset of the independent benchmarks, to which a link is provided, their claim of performance advantages over certain language runtimes isn’t misplaced. So for you to impugn on the marketing statement and by extension the character of the author(s) is in my opinion, rather disheartening.

How are you running fastapi to ensure that your benchmark is valid?