OpenBLAS: Optimization failing on AWS Graviton2 (neoverse-n1) machine
OpenBLAS matrix multiplication optimization on an AWS EC2 ARM graviton2 (neoverse-n1) system with the following julia setup seems to be failing:
- OpenBLAS 0.3.9 (present on Julia master and v1.5-rc1)
- LLVM 10 (PR https://github.com/JuliaLang/julia/pull/35318) (may be irrelevant)
- Updated ARM cpu detection (PR https://github.com/JuliaLang/julia/pull/36485) (may be irrelevant)
julia> versioninfo()
Julia Version 1.6.0-DEV.341
Commit 8367e441ac* (2020-07-01 18:30 UTC)
Platform Info:
OS: Linux (aarch64-linux-gnu)
CPU: unknown
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-10.0.0 (ORCJIT, neoverse-n1)
Environment:
JULIA_NUM_THREADS = 16
julia> LinearAlgebra.BLAS.openblas_get_config()
"OpenBLAS 0.3.9 NO_AFFINITY ARMV8 MAX_THREADS=32"
julia> using BenchmarkTools
julia> @btime x * x setup=(x=rand(Float32, 100, 100));
21.123 ms (2 allocations: 39.14 KiB)
compared to a mac:
julia> @btime x * x setup=(x=rand(Float32, 100, 100));
18.161 μs (2 allocations: 39.14 KiB)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 23 (4 by maintainers)
Great. Setting
OPENBLAS_CORETYPE=NEOVERSEN1seems to enable optimization, but falls a bit short of the mac intel result, as you predicted