tensorflow: Tensorflow crashes on build on Ubuntu 16.04 when building for skylake (avx512)

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): building from source
  • TensorFlow version (use command below): 5ae244e
  • Bazel version (if compiling from source): 0.4.5
  • CUDA/cuDNN version: CUDA 8.0.61, cudnn 6.0.21 (tried also 5.1)
  • GPU model and memory: 2x Tesla P100-PCIE-12GB
  • Exact command to reproduce: building
  • Additional information: Intel® Xeon® CPU E7-4860 v2 @ 2.60GHz, gcc version 5.4.1 20170519 (Ubuntu 5.4.1-11ubuntu2~16.04)

Describe the problem

On the regular rebuild of Tensorflow, the build crashes with bunch of error: argument of type "const void *" is incompatible with parameter of type "const something *"

Source code / logs

Crash log:

INFO: From Compiling tensorflow/core/kernels/scatter_functor_gpu.cu.cc:
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9218): error: argument of type "const void *" is incompatible with parameter of type "const float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9229): error: argument of type "const void *" is incompatible with parameter of type "const float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9242): error: argument of type "const void *" is incompatible with parameter of type "const double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9253): error: argument of type "const void *" is incompatible with parameter of type "const double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9266): error: argument of type "const void *" is incompatible with parameter of type "const float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9277): error: argument of type "const void *" is incompatible with parameter of type "const float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9290): error: argument of type "const void *" is incompatible with parameter of type "const double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9301): error: argument of type "const void *" is incompatible with parameter of type "const double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9314): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9325): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9338): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9350): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9363): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9374): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9387): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9399): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9408): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9417): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9426): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9435): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9443): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9452): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9461): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9470): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9479): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9488): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9497): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9506): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9515): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9524): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9533): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9542): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(54): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(62): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(70): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(78): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(86): error: argument of type "void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(95): error: argument of type "void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(104): error: argument of type "void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(112): error: argument of type "void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(120): error: argument of type "void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(129): error: argument of type "void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(138): error: argument of type "void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(146): error: argument of type "void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10223): error: argument of type "const void *" is incompatible with parameter of type "const float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10235): error: argument of type "const void *" is incompatible with parameter of type "const float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10247): error: argument of type "const void *" is incompatible with parameter of type "const double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10259): error: argument of type "const void *" is incompatible with parameter of type "const double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10271): error: argument of type "const void *" is incompatible with parameter of type "const float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10283): error: argument of type "const void *" is incompatible with parameter of type "const float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10295): error: argument of type "const void *" is incompatible with parameter of type "const double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10307): error: argument of type "const void *" is incompatible with parameter of type "const double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10319): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10331): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10343): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10355): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10367): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10379): error: argument of type "const void *" is incompatible with parameter of type "const int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10391): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10403): error: argument of type "const void *" is incompatible with parameter of type "const long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10413): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10424): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10433): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10444): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10453): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10464): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10473): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10484): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10493): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10504): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10513): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10524): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10533): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10544): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10553): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10564): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10573): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10584): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10593): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10604): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10613): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10624): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10633): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10644): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10653): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10664): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10673): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10684): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10693): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10704): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10713): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10724): error: argument of type "void *" is incompatible with parameter of type "long long *"

92 errors detected in the compilation of "/tmp/tmpxft_00008f12_00000000-7_scatter_functor_gpu.cu.cpp1.ii".
ERROR: /scratch/chaimb/tensorflow/tensorflow/core/kernels/BUILD:1140:1: output 'tensorflow/core/kernels/_objs/scatter_functor_gpu/tensorflow/core/kernels/scatter_functor_gpu.cu.pic.o' was not created.
ERROR: /scratch/chaimb/tensorflow/tensorflow/core/kernels/BUILD:1140:1: not all outputs were created or valid.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 147.888s, Critical Path: 69.54s

I’ve tried disabling most of the options (MKL, architecture optimizations, computability) but the crash happens even with full-default (except CUDA and XLA) configuration.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 31 (13 by maintainers)

Commits related to this issue

Most upvoted comments

I think the problem here is that gcc-5.5 shipped with avx512*intrin.h headers that switched to using void* and const void* (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731) but without switching the builtins to do the same. This is why 5.4 works but 5.5 breaks. The tensorflow r1.4 build at least can be unbroken for 5.5 by locally rolling back the above change with e.g.:

for f in avx512fintrin.h avx512pfintrin.h avx512vlintrin.h; do
   curl -H "User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36" -o $f "https://gcc.gnu.org/viewcvs/gcc/branches/gcc-5-branch/gcc/config/i386/${f}?view=co&revision=245536&content-type=text%2Fplain&pathrev=245536"
done && mv avx512*intrin.h  /usr/lib/gcc/x86_64-linux-gnu/5/include/

This is a terrible idea (who knows what else it’s rolling back? are you really cherrypicking 3 random files out of an entire release? and so on) and nobody should do it, of course.

But maybe, if you’re trying to build tensorflow for GPU (so require nvcc, so require GCC < 6) inside an ubuntu:17.10 docker image (so don’t have an apt-get’able gcc-5.4 option), this might be useful.

i had same error with System information: OS Platform and Distribution: Ubuntu 16.04 TensorFlow installed from: source TensorFlow version: 1.8 Python version: 3.6.4 GCC version: 5.5.0 CUDA/cuDNN version: CUDA 9.1/CuDNN 7.1.3 GPU model and memory: 1080 Ti

I solved this problem by switching to gcc-4.9: $ sudo apt-get install gcc-4.9 g+±4.9 $ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 50 --slave /usr/bin/g++ g++ /usr/bin/g+±4.9 Now you can switch to gcc-4.9 by using: $ sudo update-alternatives --config gcc

then run the following command: $ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

I have the exact same issue on Debian. Any update on this issue?

I believe #9296 is the same issue, and switching back to gcc-4.9 seems to solve the problem.

Here seems to be a combination supported by Nvidia.

Distribution	Kernel	GCC	GLIBC
Ubuntu 16.04	4.4.0	5.3.1	2.23

I had a similar problem and my workaround was to downgrade to gcc/g++ 4.8, see this page for how2.

ubuntu 17.04 64 bit, CUDA 8.0.61, cudnn 6.0.21, tensorflow tag v1.1.0, bazel configured to use -msse4.1, -msse4.2, -mavx, -mavx2, -mfma, cuda compute capabilities 6.1, gpu is nvidia geforce gtx 1080 ti, cpu is amd ryzen 1700x

@gunan That’s what I run (different options set were tried):


chaimb@hpc1:/scratch/chaimb/tensorflow$ ./configure
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/local/lib/python3.5/dist-packages
  /usr/local/lib/python2.7/site-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]

Using python library path: /usr/lib/python3/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Do you wish to download MKL LIB from the web? [Y/n] y
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] y
XLA JIT support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N]
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Do you want to use clang as CUDA compiler? [y/N]
nvcc will be used as CUDA compiler
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the cuDNN version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: "6.0"
Do you wish to build TensorFlow with MPI support? [y/N]
MPI support will not be enabled for TensorFlow
WARNING: Output base '/home/chaimb/.cache/bazel/_bazel_chaimb/7e5d2835edc3e2079ed27c4c385315e6' is on NFS. This may lead to surprising failures and undetermined behavior.
.....................................
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
Configuration finished
chaimb@hpc1:/scratch/chaimb/tensorflow$ bazel build --verbose_failures --jobs 96 --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

@csoehnel Confirmed, using /usr/bin/gcc-4.9 is a valid workaround.

I have this problem as well, made any progress?

Switching back to gcc-4.9 only fixes the issue because it doesn’t have the intrinsic for avx512 yet.

Something in the GPU code is including immintrin.h which now includes the new avx512 instructions for hardware enablement.

gcc-5 5.4.1-11ubuntu2 is the version on my machine, and it doesn’t matter if I add/remove MKL, XLA or any other options it will compile fine. And yes I can compile with gcc-4.9 but it doesn’t even natively support skylake let alone AVX512 instructions.

My C++ is rusty and bazel is new to me, but it seems like some work on includes in tensorflow/core/platform/platform.h

Nvidia technically supports GCC 5.4 which would should this problem, and it is a problem with llvm 3.8.1 too as it also includes these intrinsic types with an include of immintrin.h.

I wish I wasn’t so rusty or I would help to get access to these instructions but from looking around the real options seem to be either pull out the inclusion of the gcc intrinsics from cudacc code or adding explicit casts for the new types.