tensorflow: Tensorflow crashes on build on Ubuntu 16.04 when building for skylake (avx512)
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
- TensorFlow installed from (source or binary): building from source
- TensorFlow version (use command below): 5ae244e
- Bazel version (if compiling from source): 0.4.5
- CUDA/cuDNN version: CUDA 8.0.61, cudnn 6.0.21 (tried also 5.1)
- GPU model and memory: 2x Tesla P100-PCIE-12GB
- Exact command to reproduce: building
- Additional information: Intel® Xeon® CPU E7-4860 v2 @ 2.60GHz, gcc version 5.4.1 20170519 (Ubuntu 5.4.1-11ubuntu2~16.04)
Describe the problem
On the regular rebuild of Tensorflow, the build crashes with bunch of error: argument of type "const void *" is incompatible with parameter of type "const something *"
Source code / logs
Crash log:
INFO: From Compiling tensorflow/core/kernels/scatter_functor_gpu.cu.cc:
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9218): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9229): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9242): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9253): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9266): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9277): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9290): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9301): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9314): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9325): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9338): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9350): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9363): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9374): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9387): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9399): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9408): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9417): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9426): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9435): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9443): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9452): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9461): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9470): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9479): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9488): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9497): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9506): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9515): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9524): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9533): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9542): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(54): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(62): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(70): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(78): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(86): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(95): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(104): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(112): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(120): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(129): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(138): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(146): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10223): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10235): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10247): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10259): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10271): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10283): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10295): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10307): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10319): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10331): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10343): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10355): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10367): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10379): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10391): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10403): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10413): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10424): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10433): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10444): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10453): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10464): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10473): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10484): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10493): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10504): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10513): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10524): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10533): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10544): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10553): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10564): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10573): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10584): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10593): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10604): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10613): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10624): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10633): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10644): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10653): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10664): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10673): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10684): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10693): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10704): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10713): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10724): error: argument of type "void *" is incompatible with parameter of type "long long *"
92 errors detected in the compilation of "/tmp/tmpxft_00008f12_00000000-7_scatter_functor_gpu.cu.cpp1.ii".
ERROR: /scratch/chaimb/tensorflow/tensorflow/core/kernels/BUILD:1140:1: output 'tensorflow/core/kernels/_objs/scatter_functor_gpu/tensorflow/core/kernels/scatter_functor_gpu.cu.pic.o' was not created.
ERROR: /scratch/chaimb/tensorflow/tensorflow/core/kernels/BUILD:1140:1: not all outputs were created or valid.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 147.888s, Critical Path: 69.54s
I’ve tried disabling most of the options (MKL, architecture optimizations, computability) but the crash happens even with full-default (except CUDA and XLA) configuration.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 31 (13 by maintainers)
Commits related to this issue
- Switch to GCC 6 and CUDA 9.1 container because building with optimizations on CUDA 9.0 with GCC 5.5 fails due to some weird CUDA/GCC interaction. See also: this[1] Tensorflow issue and many other. [... — committed to ginkgo-project/ginkgo by tcojean 5 years ago
- Switch to GCC 6 and CUDA 9.1 container because building with optimizations on CUDA 9.0 with GCC 5.5 fails due to some weird CUDA/GCC interaction. See also: this[1] Tensorflow issue and many other. [... — committed to ginkgo-project/ginkgo by tcojean 5 years ago
I think the problem here is that gcc-5.5 shipped with avx512*intrin.h headers that switched to using
void*andconst void*(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731) but without switching the builtins to do the same. This is why 5.4 works but 5.5 breaks. The tensorflow r1.4 build at least can be unbroken for 5.5 by locally rolling back the above change with e.g.:This is a terrible idea (who knows what else it’s rolling back? are you really cherrypicking 3 random files out of an entire release? and so on) and nobody should do it, of course.
But maybe, if you’re trying to build tensorflow for GPU (so require nvcc, so require GCC < 6) inside an ubuntu:17.10 docker image (so don’t have an apt-get’able gcc-5.4 option), this might be useful.
i had same error with System information: OS Platform and Distribution: Ubuntu 16.04 TensorFlow installed from: source TensorFlow version: 1.8 Python version: 3.6.4 GCC version: 5.5.0 CUDA/cuDNN version: CUDA 9.1/CuDNN 7.1.3 GPU model and memory: 1080 Ti
I solved this problem by switching to gcc-4.9: $ sudo apt-get install gcc-4.9 g+±4.9 $ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 50 --slave /usr/bin/g++ g++ /usr/bin/g+±4.9 Now you can switch to gcc-4.9 by using: $ sudo update-alternatives --config gcc
then run the following command: $ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
I have the exact same issue on Debian. Any update on this issue?
I believe #9296 is the same issue, and switching back to
gcc-4.9seems to solve the problem.Here seems to be a combination supported by Nvidia.
I had a similar problem and my workaround was to downgrade to gcc/g++ 4.8, see this page for how2.
ubuntu 17.04 64 bit, CUDA 8.0.61, cudnn 6.0.21, tensorflow tag v1.1.0, bazel configured to use -msse4.1, -msse4.2, -mavx, -mavx2, -mfma, cuda compute capabilities 6.1, gpu is nvidia geforce gtx 1080 ti, cpu is amd ryzen 1700x
@gunan That’s what I run (different options set were tried):
@csoehnel Confirmed, using
/usr/bin/gcc-4.9is a valid workaround.I have this problem as well, made any progress?
Switching back to
gcc-4.9only fixes the issue because it doesn’t have the intrinsic for avx512 yet.Something in the GPU code is including
immintrin.hwhich now includes the new avx512 instructions for hardware enablement.gcc-5 5.4.1-11ubuntu2 is the version on my machine, and it doesn’t matter if I add/remove MKL, XLA or any other options it will compile fine. And yes I can compile with gcc-4.9 but it doesn’t even natively support skylake let alone AVX512 instructions.
My C++ is rusty and bazel is new to me, but it seems like some work on includes in
tensorflow/core/platform/platform.hNvidia technically supports GCC 5.4 which would should this problem, and it is a problem with llvm 3.8.1 too as it also includes these intrinsic types with an include of immintrin.h.
I wish I wasn’t so rusty or I would help to get access to these instructions but from looking around the real options seem to be either pull out the inclusion of the gcc intrinsics from cudacc code or adding explicit casts for the new types.