bitsandbytes: Building on Jetson AGX Xavier Development Kit fails

Hi,

i am trying to build bitsandbytes on a Nvidia Jetson AGX Xavier Kit, but it fails, not finding emmintrin.h:

/home/g/bitsandbytes# CUDA_VERSION=114 make cuda11x_nomatmul

ENVIRONMENT ============================ CUDA_VERSION: 114 ============================ NVCC path: /usr/local/cuda/bin/nvcc GPP path: /usr/bin/g++ VERSION: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 CUDA_HOME: /usr/local/cuda CONDA_PREFIX: PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin/ LD_LIBRARY_PATH: ============================ /usr/local/cuda/bin/nvcc -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -Xcompiler ‘-fPIC’ --use_fast_math -Xptxas=-v -dc /home/g/bitsandbytes/csrc/ops.cu /home/g/bitsandbytes/csrc/kernels.cu -I /home/g/sse2neon -I /usr/local/cuda/include -I /home/g/bitsandbytes/csrc -I /include -I /home/g/bitsandbytes/include -L /usr/local/cuda/lib64 -lcudart -lcublas -lcublasLt -lcurand -lcusparse -L /lib --output-directory /home/g/bitsandbytes/build -D NO_CUBLASLT nvcc warning : The ‘compute_35’, ‘compute_37’, ‘compute_50’, ‘sm_35’, ‘sm_37’ and ‘sm_50’ architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). In file included from /home/g/bitsandbytes/include/BinSearch.h:5, from /home/g/bitsandbytes/csrc/ops.cu:10: /home/g/bitsandbytes/include/SIMD.h:32:10: fatal error: emmintrin.h: No such file or directory 32 | #include <emmintrin.h> | ^~~~~~~~~~~~~ compilation terminated. make: *** [Makefile:83: cuda11x_nomatmul] Error 1

Did a bit of research and, not knowing what i am doing, I changed SMID.h to include sse2neon.h instead of emmintrin.h. NOW it fails again, catastrophically, not finding builtin functions:

/usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(38): error: identifier “__Int8x8_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(39): error: identifier “__Int16x4_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(40): error: identifier “__Int32x2_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(41): error: identifier “__Int64x1_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(42): error: identifier “__Float16x4_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(43): error: identifier “__Float32x2_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(44): error: identifier “__Poly8x8_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(45): error: identifier “__Poly16x4_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(46): error: identifier “__Uint8x8_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(47): error: identifier “__Uint16x4_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(48): error: identifier “__Uint32x2_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(49): error: identifier “__Float64x1_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(50): error: identifier “__Uint64x1_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(51): error: identifier “__Int8x16_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(52): error: identifier “__Int16x8_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(53): error: identifier “__Int32x4_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(54): error: identifier “__Int64x2_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(55): error: identifier “__Float16x8_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(56): error: identifier “__Float32x4_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(57): error: identifier “__Float64x2_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(58): error: identifier “__Poly8x16_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(59): error: identifier “__Poly16x8_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(60): error: identifier “__Poly64x2_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(61): error: identifier “__Poly64x1_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(62): error: identifier “__Uint8x16_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(63): error: identifier “__Uint16x8_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(64): error: identifier “__Uint32x4_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(65): error: identifier “__Uint64x2_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(67): error: identifier “__Poly8_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(68): error: identifier “__Poly16_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(69): error: identifier “__Poly64_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(70): error: identifier “__Poly128_t” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(72): error: identifier “__fp16” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(795): error: identifier “__builtin_aarch64_saddlv8qi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(802): error: identifier “__builtin_aarch64_saddlv4hi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(809): error: identifier “__builtin_aarch64_saddlv2si” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(816): error: identifier “__builtin_aarch64_uaddlv8qi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(824): error: identifier “__builtin_aarch64_uaddlv4hi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(832): error: identifier “__builtin_aarch64_uaddlv2si” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(840): error: identifier “__builtin_aarch64_saddl2v16qi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(847): error: identifier “__builtin_aarch64_saddl2v8hi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(854): error: identifier “__builtin_aarch64_saddl2v4si” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(861): error: identifier “__builtin_aarch64_uaddl2v16qi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(869): error: identifier “__builtin_aarch64_uaddl2v8hi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(877): error: identifier “__builtin_aarch64_uaddl2v4si” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(885): error: identifier “__builtin_aarch64_saddwv8qi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(892): error: identifier “__builtin_aarch64_saddwv4hi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(899): error: identifier “__builtin_aarch64_saddwv2si” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(906): error: identifier “__builtin_aarch64_uaddwv8qi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(914): error: identifier “__builtin_aarch64_uaddwv4hi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(922): error: identifier “__builtin_aarch64_uaddwv2si” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(930): error: identifier “__builtin_aarch64_saddw2v16qi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(937): error: identifier “__builtin_aarch64_saddw2v8hi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(944): error: identifier “__builtin_aarch64_saddw2v4si” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(951): error: identifier “__builtin_aarch64_uaddw2v16qi” is undefined /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h(959): error: identifier “__builtin_aarch64_uaddw2v8hi” is undefined

SETUP:

nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2022 NVIDIA Corporation Built on Sun_Oct_23_22:16:07_PDT_2022 Cuda compilation tools, release 11.4, V11.4.315 Build cuda_11.4.r11.4/compiler.31964100_0

Flashed using JetPack 5.1 (Ubuntu 20.04)

R35 (release), REVISION: 2.1, GCID: 32413640, BOARD: t186ref, EABI: aarch64, DATE: Tue Jan 24 23:38:33 UTC 2023 Linux ubuntu 5.10.104-tegra #1 SMP PREEMPT Tue Jan 24 15:09:44 PST 2023 aarch64 aarch64 aarch64 GNU/Linux

full_output_nvcc-verbose.txt

Any help would be greatly appreciated, thank you!

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 50 (4 by maintainers)

Most upvoted comments

Support for Apple silicon #252 shows another Aarch64 approach. Would be a good idea to merge these efforts.

is the issue of no negative numbers related to this: pytorch/pytorch#52146 ?

you were right! I systematically replaced all chars with in8_t and it works now, it was somewhere in kernels.cu. will find out which change exactly did it and update the repository later

can you share the modified code? we are facing the same problem, and can debug together.

sure. here’s the fork: https://github.com/g588928812/bitsandbytes_jetsonX