jsonschema: performance regression from 3.2.0 to 4.0.1
Hi @Julian, first of all thanks for this library and all the hard work. We rely heavily on it and are delighted to see Draft 2020-12
support as it is required for OpenAPI 3.1.0
When updating, I noticed our test suite ran noticeably slower. I compiled a quick reproduction for you. On my machine I get a 5x slowdown with the new version. Is this performance hit to be expected due to the new capabilities or is it a regression?
import jsonschema
import yaml
import json
import time
# https://raw.githubusercontent.com/tfranzel/drf-spectacular/master/tests/test_basic.yml
with open('tests/test_basic.yml') as fh:
data = yaml.load(fh.read(), Loader=yaml.SafeLoader)
# https://raw.githubusercontent.com/tfranzel/drf-spectacular/master/drf_spectacular/validation/openapi3_schema.json
# which comes from:
# https://github.com/OAI/OpenAPI-Specification/blob/6d17b631fff35186c495b9e7d340222e19d60a71/schemas/v3.0/schema.json
with open('drf_spectacular/validation/openapi3_schema.json') as fh:
openapi3_schema_spec = json.load(fh)
t_acc = 0
for i in range(500):
t0 = time.time()
jsonschema.validate(instance=data, schema=openapi3_schema_spec)
t1 = time.time()
t_acc += t1 - t0
print(f'{t_acc} sec')
✗ python --version; pip freeze | grep json; python test.py
Python 3.9.5
jsonschema==3.2.0
5.254251718521118 sec
✗ python --version; pip freeze | grep json; python test.py
Python 3.9.5
jsonschema==4.0.1
28.189855813980103 sec
✗ python --version; pip freeze | grep json; python test.py
Python 3.9.7
jsonschema==4.2.1
27.27832531929016 sec
✗ python --version; pip freeze | grep json; python test.py
Python 3.9.7
jsonschema==4.3.1
8.10183048248291 sec
EDIT: included measurement for 4.2.1
release
EDIT: included measurement for 4.3.1
release
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 9
- Comments: 18 (9 by maintainers)
Thanks all for the feedback and a big thanks again to @Stranger6667. Sounds like we can close this for now.
Seems like it improved a lot for the specific testcase I used, 80 or so times faster and only a few % slower than 3.2.0
jsonschema.__version__='4.3.1' completed in 0.83s
v4.3.1 is out with great thanks to @Stranger6667 for putting in the time to make the fix. I haven’t fully tested myself against the examples above but please do share feedback.
Great! Happy to help 😃
After some more adjustments:
Cuts the execution time to 11.305032968521118 sec
I’ll submit a patch shortly
It will take some digging, but there is indeed a “known” performance chance in 4.0.0: https://github.com/Julian/jsonschema/blob/main/CHANGELOG.rst#v400
specifically:
If someone could confirm (or deny) whether that’s the culprit it’d be helpful, but yeah this may need some investigating (some I won’t know I have time for for at least a few days).
CC @skamensky as well who I know was interested in doing some performance optimization – here’s another benchmark we may want to adopt or use to drive any change.
I noticed this too, a set of tests I run for an application that uses jsonschema heavily the test suite now takes 60 minutes vs 14 before.
updated my measurements in the OP. thank you guys for putting in the work! ❤️ it is a 3x improvement, but still a little bit slower than the 3.2 version. I think this is a manageable slowdown now and from my side the ticket could be closed.
I wanted to have a quick look to see If I could spot something obviously wrong, but havent successed. But I do have some nice flamecharts to share:
jsonschema.__version__='3.2.0' completed in 0.77s
jsonschema.__version__='4.2.0' completed in 46.77s
Made using
Another quick note here about the impact of this: in altair_viewer, with jsonschema<4.0, the test suite runs in 30 seconds. With jsonschema 4.0 or newer, the test suite times out after 6 hours. We fixed this in https://github.com/altair-viz/altair_viewer/pull/44 by pinning to
jsonschema<4.0