azure-sdk-for-python: [Storage]: Severe performance degradation observed using v12.0 *.aio BlobClient

Severe performance degradation observed using v12.0 *.aio BlobClient

Team…on a current project we are seeing some drastic performance degradations when attempting to use the *.aio libs from v12.0 of azure-storage-blob. It’s quite possible we may be using the library incorrectly, but have not been able to find any documentation stating otherwise. The issue outlined below may in fact be a bug.

System Under Test

Repo: https://github.com/mjyeaney/CustomVisionSlidingWindowAPI
Linux App Service Plan
B2 App Service Plan (200 ACU, 3.5 GB RAM, A-series equiv), instance count = 1
Running Gunicorn -> Uvicorn (ASGI server) -> Quart (async REST fx)
Gunicorn worker count: CPU core count + 1
Using azure-storage-blob v.12.0, *.aio methods

Load Testing Environment

Located in same datacenter
Ds2v3 (2 CPU, 8GB RAM)
Tests run using Apache Bench (ab) utility

Tests

There are 2 endpoints that we’re testing to ensure we don’t have any unexpected regressions. The first is a simple JSON serialization using coroutines/await+async, and does NOT use the Azure storage SDK in any code path.

Running ab against this endpoint with n=1000, c=50, this is handling ~450 - 550 req/sec (sample output below), showing that even on limited dev/test hardware this is performing reasonably well:

Concurrency Level:      50
Time taken for tests:   1.842 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      330000 bytes
HTML transferred:       38000 bytes
Requests per second:    542.83 [#/sec] (mean)
Time per request:       92.110 [ms] (mean)
Time per request:       1.842 [ms] (mean, across all concurrent requests)
Transfer rate:          174.94 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    2   0.5      2       8
Processing:     4   89  57.0     80     318
Waiting:        4   89  57.0     80     318
Total:          5   91  57.1     82     321

Percentage of the requests served within a certain time (ms)
  50%     82
  66%    103
  75%    120
  80%    130
  90%    159
  95%    186
  98%    245
  99%    319
 100%    321 (longest request)

The second endpoint is simply downloading a VERY small JSON blob (~76 bytes), using the following code:

...
blob_client = BlobClient.from_connection_string(self.connection_string, "jobs", operation_id)
data_stream = await blob_client.download_blob()
raw_value = await data_stream.readall()
...

Using the same ab parameters (n=1000, c=50), we see a DRASTIC bottleneck showing, as the req/sec are reduced to around 20 - 25 rps:

Concurrency Level:      50
Time taken for tests:   43.861 seconds
Complete requests:      1000
Failed requests:        0
Non-2xx responses:      1000
Total transferred:      431000 bytes
HTML transferred:       126000 bytes
Requests per second:    22.80 [#/sec] (mean)
Time per request:       2193.062 [ms] (mean)
Time per request:       43.861 [ms] (mean, across all concurrent requests)
Transfer rate:          9.60 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    2   0.4      2       8
Processing:   193 2171 1116.8   1969    5131
Waiting:      193 2171 1116.8   1969    5130
Total:        194 2173 1116.8   1970    5133

Percentage of the requests served within a certain time (ms)
  50%   1970
  66%   2414
  75%   2964
  80%   3193
  90%   3767
  95%   4524
  98%   4788
  99%   5130
 100%   5133 (longest request)

This does not seem to match at all what the underlying storage account is reporting via metrics (max around 20ms, allowing for 50ms just to be safe). Additionally, total transactions is reporting nearly exactly the number of tests (~1000), indicated that general purpose storage account throttling is not occuring.

Given the very small size of these blobs, and the fact the underlying pipeline transport should be using the aiohttp provider, something feels like it is either overwhelming the event loop, or still using a blocking call unintentionally.

Would appreciate any insights that can be offered - while there will of course be overhead to calling storage (don’t expect the same perf as the first test), there is seemingly quite a bit of un-accounted for time being taken somewhere. We definitely do not want to resort to implementing these calls via Tornado / etc. by hand if we don’t have to.

Thanks!

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 18 (9 by maintainers)

Most upvoted comments

Perfect! I’ll test it tomorrow and post results here.

Thanks!

mjyeaney on Nov 27, 2019

@mjyeaney We have just released the latest version of azure-core. Can you upgrade it using pip install azure-core==1.1.0 and see if you see improvements with results? Thanks.

rakshith91 on Nov 27, 2019

Local dev environment: Python 3.6.8, pip 9.0.1

Dockerfile: FROM python:3.6

mjyeaney on Nov 18, 2019