tensorflow: Failed to compile 'tensorflow/lite/experimental/ruy/pack_avx512.cc'

System information

Have I written custom code - YES, but not in the failing part
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ubuntu 18.04
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): master branch
Python version: 3.6
Bazel version (if compiling from source): 0.21
GCC/Compiler version (if compiling from source): gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CUDA/cuDNN version: N/A
GPU model and memory:N/A

tensorflow/lite/experimental/ruy/pack_avx512.cc: In function 'void ruy::{anonymous}::HalfPackFloatAvx512(const float*, const float*, int, int, int, float*, float*)':
tensorflow/lite/experimental/ruy/pack_avx512.cc:343:41: error: cannot convert '__m512 {aka __vector(16) float}' to '__m512i {aka __vector(8) long long int}' in assignment
         t0 = LoaduTwo(src_ptr0, src_ptr4);
                                         ^
tensorflow/lite/experimental/ruy/pack_avx512.cc:344:41: error: cannot convert '__m512 {aka __vector(16) float}' to '__m512i {aka __vector(8) long long int}' in assignment
         t1 = LoaduTwo(src_ptr1, src_ptr5);
                                         ^
tensorflow/lite/experimental/ruy/pack_avx512.cc:345:41: error: cannot convert '__m512 {aka __vector(16) float}' to '__m512i {aka __vector(8) long long int}' in assignment
         t2 = LoaduTwo(src_ptr2, src_ptr6);
                                         ^
tensorflow/lite/experimental/ruy/pack_avx512.cc:346:41: error: cannot convert '__m512 {aka __vector(16) float}' to '__m512i {aka __vector(8) long long int}' in assignment
         t3 = LoaduTwo(src_ptr3, src_ptr7);
                                         ^
tensorflow/lite/experimental/ruy/pack_avx512.cc:363:9: error: '_mm256_storeu_epi32' was not declared in this scope
         _mm256_storeu_epi32(packed_ptr + 0 * 16, _mm512_castsi512_si256(r0));
         ^~~~~~~~~~~~~~~~~~~
tensorflow/lite/experimental/ruy/pack_avx512.cc:363:9: note: suggested alternative: '_mm256_store_epi64'
         _mm256_storeu_epi32(packed_ptr + 0 * 16, _mm512_castsi512_si256(r0));
         ^~~~~~~~~~~~~~~~~~~
         _mm256_store_epi64
tensorflow/lite/experimental/ruy/pack_avx512.cc:382:55: error: cannot convert '__m512 {aka __vector(16) float}' to '__m512i {aka __vector(8) long long int}' in assignment
         t0 = MaskLoaduTwo(row_mask, src_ptr0, src_ptr4);
                                                       ^
tensorflow/lite/experimental/ruy/pack_avx512.cc:383:55: error: cannot convert '__m512 {aka __vector(16) float}' to '__m512i {aka __vector(8) long long int}' in assignment
         t1 = MaskLoaduTwo(row_mask, src_ptr1, src_ptr5);
                                                       ^
tensorflow/lite/experimental/ruy/pack_avx512.cc:384:55: error: cannot convert '__m512 {aka __vector(16) float}' to '__m512i {aka __vector(8) long long int}' in assignment
         t2 = MaskLoaduTwo(row_mask, src_ptr2, src_ptr6);
                                                       ^
tensorflow/lite/experimental/ruy/pack_avx512.cc:385:55: error: cannot convert '__m512 {aka __vector(16) float}' to '__m512i {aka __vector(8) long long int}' in assignment
         t3 = MaskLoaduTwo(row_mask, src_ptr3, src_ptr7);
                                                       ^
tensorflow/lite/experimental/ruy/pack_avx512.cc:402:9: error: '_mm256_storeu_epi32' was not declared in this scope
         _mm256_storeu_epi32(trailing_buf + 0 * 16, _mm512_castsi512_si256(r0));
         ^~~~~~~~~~~~~~~~~~~
tensorflow/lite/experimental/ruy/pack_avx512.cc:402:9: note: suggested alternative: '_mm256_store_epi64'
         _mm256_storeu_epi32(trailing_buf + 0 * 16, _mm512_castsi512_si256(r0));
         ^~~~~~~~~~~~~~~~~~~
         _mm256_store_epi64
tensorflow/lite/experimental/ruy/pack_avx512.cc: In function 'void ruy::Pack8bitAvx512(const int8_t*, int8_t, const int8_t*, int, int, int, int8_t*, int32_t*)':
tensorflow/lite/experimental/ruy/pack_avx512.cc:465:3: error: 'memset' was not declared in this scope
   memset(trailing_buf, 0, kTrailingBufSize * sizeof(std::int8_t));
   ^~~~~~
tensorflow/lite/experimental/ruy/pack_avx512.cc:465:3: note: suggested alternative: 'Offset'
   memset(trailing_buf, 0, kTrailingBufSize * sizeof(std::int8_t));
   ^~~~~~
   Offset
tensorflow/lite/experimental/ruy/pack_avx512.cc:500:5: error: 'memcpy' was not declared in this scope
     memcpy(packed_ptr + Layout::kCols * non_trailing_rows, trailing_buf,
     ^~~~~~
tensorflow/lite/experimental/ruy/pack_avx512.cc:500:5: note: suggested alternative: '_m_empty'
     memcpy(packed_ptr + Layout::kCols * non_trailing_rows, trailing_buf,
     ^~~~~~
     _m_empty
tensorflow/lite/experimental/ruy/pack_avx512.cc: In function 'void ruy::PackFloatAvx512(const float*, const float*, int, int, int, float*)':
tensorflow/lite/experimental/ruy/pack_avx512.cc:516:5: error: 'memset' was not declared in this scope
     memset(trailing_buf, 0, sizeof(trailing_buf));
     ^~~~~~
tensorflow/lite/experimental/ruy/pack_avx512.cc:516:5: note: suggested alternative: 'Offset'
     memset(trailing_buf, 0, sizeof(trailing_buf));
     ^~~~~~
     Offset
tensorflow/lite/experimental/ruy/pack_avx512.cc:524:5: error: 'memcpy' was not declared in this scope
     memcpy(packed_ptr + 16 * non_trailing_rows, trailing_buf,
     ^~~~~~
tensorflow/lite/experimental/ruy/pack_avx512.cc:524:5: note: suggested alternative: '_m_empty'
     memcpy(packed_ptr + 16 * non_trailing_rows, trailing_buf,

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 4
Comments: 41 (17 by maintainers)

Commits related to this issue

Workaround Skylake bug as suggested by TF #31187 Ref https://github.com/tensorflow/tensorflow/issues/31187#issuecomment-523673269 — committed to Huawei-MRC-OSI/tensorflow by deleted user 5 years ago
Workaround Skylake bug as suggested by TF #31187 Ref https://github.com/tensorflow/tensorflow/issues/31187#issuecomment-523673269 — committed to Huawei-MRC-OSI/tensorflow by deleted user 5 years ago
Workaround Skylake bug as suggested by TF #31187 Ref https://github.com/tensorflow/tensorflow/issues/31187#issuecomment-523673269 — committed to Huawei-MRC-OSI/tensorflow by deleted user 5 years ago

Most upvoted comments

Master branch builds successfully. Maybe you guys just need to use the code in master for /tensorflow/lite/experimental/ruy/kernel.h in r2.0.

dbonner on Aug 24, 2019

I tried: CC=/usr/lib/llvm-9/bin/clang CXX=/usr/lib/llvm-9/bin/clang++ bazel build --config=opt --define=using_clang=true --define=using_cuda_clang=true --config=v2 --cxxopt=“-D_GLIBCXX_USE_CXX11_ABI=0” //tensorflow/tools/pip_package:build_pip_package

This built for longer and seemed to get past the error we’re talking about here. However, due to incompatibilities with clang and gcc (I think) it eventually failed to build with an unrelated error.

You can get a successful build if you use comments in: ~/tensorflow/tensorflow/lite/experimental/ruy/platform.h Comment out: ` // TODO(b/138433137) Select AVX-512 at runtime rather than via compile options.

// #if defined(AVX512F) && defined(AVX512DQ) && defined(AVX512CD) &&
defined(AVX512BW) && defined(AVX512VL)

// #define RUY_DONOTUSEDIRECTLY_AVX512 1

// #else

#define RUY_DONOTUSEDIRECTLY_AVX512 0

// #endif~/tensorflow/tensorflow/lite/experimental/ruy/platform.h

i.e. Always: #define RUY_DONOTUSEDIRECTLY_AVX512 0

then the build completes successfully.

I’m no C developer. I fear that tensorflow’s kernel_avx512.cc is trying to accomplish with AVX-512 on a Skylake is not possible with the compatible gcc (version 7). This incompatibility (support for unmasked instructions) does not appear to have been updated on the latest source of gcc. I don’t think it is easy to compile all of Tensorflow with clang-9 as it is incompatible in other ways out of the box. My limited knowledge suggests the only 3 solutions are:

Re-writing gcc intrinsics support. I submitting patches to the gcc git.
Giving up on AVX-512 (#define RUY_DONOTUSEDIRECTLY_AVX512 0).
Making a flag that you include in “bazel build” that would make the whole build compatible with clang-9. This would require re-writing areas of the code where clang-9 causes errors because it is not a straight-out replacement for gcc.

dbonner on Aug 21, 2019

Thanks. This seems to bring it to a case of

tensorflow/lite/experimental/ruy/kernel_avx512.cc: In function ‘void ruy::Kernel8bitAvx512(const ruy::KernelParams8bit<16, 16>&)’: tensorflow/lite/experimental/ruy/kernel_avx512.cc:111:34: error: ‘_mm512_loadu_epi8’ was not declared in this scope const __m512i lhs_data = _mm512_loadu_epi8(lhs_ptr); ^~~~~~~~~~~~~~~~~

There are two possibilities: (a) Not compiling for Skylake architectures, or not using Clang.

See https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3373,3373,2422,2452,2455,2186,103,2186,2204,87,4008,3534,2197,2192,2201,5008,2201,6098,245,3171,2458,2201,3505,5205,4550,94,1548,1383,3533,3533,3505,2984,3021,3263,3518,3956,3992,4029,6042,1383,429,1192,3395&avx512techs=AVX512F,AVX512BW,AVX512CD,AVX512DQ,AVX512VL&techs=MMX,SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2,AVX,AVX2,FMA&text=_mm512_loadu_epi8 Note the requirement for AVX512BW support.

This should not be enabled. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/ruy/platform.h#L81 and https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/ruy/platform.h#L94

(b) Compiling for Skylake architecture and have Clang and under Linux. This error should not occur. If it does, look at what header file is pulled in here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/ruy/pack_avx512.cc#L28 Explore the tree of header files, which are required to have AVX512BW guards on a Skylake-enabled build setup.

If your code is up to date, and the TFLite does what we expect, then you are hitting a bug in your setup. There is nothing the code can do if the immintrin.h file installed and used in a Skylake build does not have Skylake support.

jalexstark on Aug 19, 2019

I just tried to build the r2.0 branch and have exactly the same error. I am running Ubuntu 18.04. My architecture is skylake too (Intel 7820x CPU). My GPU is an RTX 2080 Ti. It’s a desktop computer not a high end workstation. I ran the following: ./configure Accepted defaults initially When asked for CUDA support, typed “y” When asked for TensorRT support, typed “y” When asked for compute capabilities, typed 7.5 Accepted defaults thereafter bazel build --config=opt --config=cuda --config=v2 --cxxopt=“-D_GLIBCXX_USE_CXX11_ABI=0” //tensorflow/tools/pip_package:build_pip_package I get the exact same error. Also hoping for a fix or the ability to exclude tf lite. Would you even need tf lite on a desktop computer? 😃 Also hoping for a fix. Thanks

dbonner on Aug 10, 2019