OpenBLAS: Optimization failing on AWS Graviton2 (neoverse-n1) machine

OpenBLAS matrix multiplication optimization on an AWS EC2 ARM graviton2 (neoverse-n1) system with the following julia setup seems to be failing:

OpenBLAS 0.3.9 (present on Julia master and v1.5-rc1)
LLVM 10 (PR https://github.com/JuliaLang/julia/pull/35318) (may be irrelevant)
Updated ARM cpu detection (PR https://github.com/JuliaLang/julia/pull/36485) (may be irrelevant)

julia> versioninfo()
Julia Version 1.6.0-DEV.341
Commit 8367e441ac* (2020-07-01 18:30 UTC)
Platform Info:
  OS: Linux (aarch64-linux-gnu)
  CPU: unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-10.0.0 (ORCJIT, neoverse-n1)
Environment:
  JULIA_NUM_THREADS = 16

julia> LinearAlgebra.BLAS.openblas_get_config()
"OpenBLAS 0.3.9 NO_AFFINITY ARMV8 MAX_THREADS=32"

julia> using BenchmarkTools
julia> @btime x * x setup=(x=rand(Float32, 100, 100));
  21.123 ms (2 allocations: 39.14 KiB)

compared to a mac:

julia> @btime x * x setup=(x=rand(Float32, 100, 100));
18.161 μs (2 allocations: 39.14 KiB)

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 23 (4 by maintainers)

Most upvoted comments

Great. Setting OPENBLAS_CORETYPE=NEOVERSEN1 seems to enable optimization, but falls a bit short of the mac intel result, as you predicted

julia> @btime x * x setup=(x=rand(Float32, 100, 100));
  52.135 μs (2 allocations: 39.14 KiB)

IanButterworth on Jul 1, 2020