simdjson: MSVC simdjson is slower than g++ on Windows
On the same machine and OS, WSL g++ 7.5-compiled simdjson parses at 2.6GB/s and MSVC 2019-compiled simdjson parses at 1.0GB/s. ClangCL parses at 1.4GB/s, so there might be a link.exe thing going on there. My machine is Kady Lake R (AVX2 but not AVX512).
After investigation: these seem to be the major impactors:
- 40%: @TrianglesPCT may be fixing some or all of the most major regression, caused by generic SIMD, by removing lambdas.
- 10%: We need to understand why this did not fully recover the performance we had before this. Either one of them could be the culprit, but it’s probably not anything in between.
- 10%: We need to understand why we lost another 10% to the stage 1 structural scanner refactor.
Data
g++ 7.5.0 under WSL
jkeiser@JKEISER-THINKPAD:~/simdjson/build$ benchmark/parse ../jsonexamples/twitter.json
number of iterations 200
../jsonexamples/twitter.json
9867 blocks - 631515 bytes - 55263 structurals ( 8.8 %)
special blocks with: utf8 2284 ( 23.1 %) - escape 598 ( 6.1 %) - 0 structurals 1287 ( 13.0 %) - 1+ structurals 8581 ( 87.0 %) - 8+ structurals 3272 ( 33.2 %) - 16+ structurals 0 ( 0.0 %)
special block flips: utf8 1104 ( 11.2 %) - escape 642 ( 6.5 %) - 0 structurals 940 ( 9.5 %) - 1+ structurals 940 ( 9.5 %) - 8+ structurals 2593 ( 26.3 %) - 16+ structurals 0 ( 0.0 %)
All Stages
| Speed : 24.3210 ns per block ( 70.04%) - 0.3800 ns per byte - 4.3429 ns per structural - 2.631 GB/s
|- Stage 1
| Speed : 11.5728 ns per block ( 33.33%) - 0.1808 ns per byte - 2.0665 ns per structural - 5.530 GB/s
|- Stage 2
| Speed : 12.6267 ns per block ( 36.36%) - 0.1973 ns per byte - 2.2547 ns per structural - 5.068 GB/s
3181.7 documents parsed per second
VS 2019 (cl.exe 19.25.28614)
PS C:\Users\john\Source\simdjson\build> .\benchmark\Release\parse.exe ..\jsonexamples\twitter.json
number of iterations 200
..\jsonexamples\twitter.json
9867 blocks - 631515 bytes - 55263 structurals ( 8.8 %)
special blocks with: utf8 2284 ( 23.1 %) - escape 598 ( 6.1 %) - 0 structurals 1287 ( 13.0 %) - 1+ structurals 8581 ( 87.0 %) - 8+ structurals 3272 ( 33.2 %) - 16+ structurals 0 ( 0.0 %)
special block flips: utf8 1104 ( 11.2 %) - escape 642 ( 6.5 %) - 0 structurals 940 ( 9.5 %) - 1+ structurals 940 ( 9.5 %) - 8+ structurals 2593 ( 26.3 %) - 16+ structurals 0 ( 0.0 %)
All Stages
| Speed : 65.5249 ns per block ( 83.29%) - 1.0239 ns per byte - 11.7004 ns per structural - 0.977 GB/s
|- Allocation
| Speed : 2.8679 ns per block ( 3.65%) - 0.0448 ns per byte - 0.5121 ns per structural - 22.315 GB/s
|- Stage 1
| Speed : 32.2862 ns per block ( 41.04%) - 0.5045 ns per byte - 5.7652 ns per structural - 1.982 GB/s
|- Stage 2
| Speed : 29.4285 ns per block ( 37.41%) - 0.4598 ns per byte - 5.2549 ns per structural - 2.175 GB/s
1976.0 documents parsed per second
VS 2019 (cl.exe 19.25.28614) with /arch:AVX2
Compiling with /arch:AVX2 only gave a 10% improvement:
PS C:\Users\john\Source\simdjson\build> .\benchmark\Release\parse.exe ..\jsonexamples\twitter.json
number of iterations 200
..\jsonexamples\twitter.json
9867 blocks - 631515 bytes - 55263 structurals ( 8.8 %)
special blocks with: utf8 2284 ( 23.1 %) - escape 598 ( 6.1 %) - 0 structurals 1287 ( 13.0 %) - 1+ structurals 8581 ( 87.0 %) - 8+ structurals 3272 ( 33.2 %) - 16+ structurals 0 ( 0.0 %)
special block flips: utf8 1104 ( 11.2 %) - escape 642 ( 6.5 %) - 0 structurals 940 ( 9.5 %) - 1+ structurals 940 ( 9.5 %) - 8+ structurals 2593 ( 26.3 %) - 16+ structurals 0 ( 0.0 %)
All Stages
| Speed : 60.7013 ns per block ( 82.70%) - 0.9485 ns per byte - 10.8391 ns per structural - 1.054 GB/s
|- Allocation
| Speed : 2.4726 ns per block ( 3.37%) - 0.0386 ns per byte - 0.4415 ns per structural - 25.882 GB/s
|- Stage 1
| Speed : 27.1889 ns per block ( 37.04%) - 0.4249 ns per byte - 4.8550 ns per structural - 2.354 GB/s
|- Stage 2
| Speed : 29.8135 ns per block ( 40.62%) - 0.4659 ns per byte - 5.3236 ns per structural - 2.147 GB/s
2246.1 documents parsed per second
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 103 (91 by maintainers)
@pps83 oh! No, I closed this and split it into two separate issues: #847 (MSVC vs. ClangCL) and #848 (ClangCL vs. WSL clang).
I tried to test my project where I get best results with simdjson (vs other parses such as rapidjson). With MS compiler I get roughly 6.500s runtime (50% is spent in json parsing). With clang-cl I get roughly 6.050s. Tthat is, for json parsing itself, assuming other things are equal, I get 3.500s vs 3.050s, or 13% speed up with clang.