tensorflow: Error trying to build for macOS with GPU support: "no toolchain corresponding to 'local_darwin' found for cpu 'darwin' "

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): N/A
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS ‘High Sierra’ Version 10.13.4 (17E202)
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): N/A, attempting to compile at e365deab1333005c8aa186632f160c1bfd4485f8 with minimal local changes (see below)
Python version: 2.7.15
Bazel version (if compiling from source): 0.13.1-homebrew
GCC/Compiler version (if compiling from source): Apple LLVM version 8.1.0 (clang-802.0.42)
CUDA/cuDNN version: cuda_9.2.64_mac with cuda_9.2.64.1_mac, cudnn-9.2-osx-x64-v7.1
GPU model and memory: NVIDIA GeForce GT 750M with 2 GB device memory, CUDA Compute Capability 3.0
Exact command to reproduce:

./configure
bazel build --config=opt --config=cuda --save_temps --explain=explain.txt --verbose_explanations --verbose_failures --linkopt=-Wl,-rpath,/usr/local/cuda/lib //tensorflow/tools/pip_package:build_pip_package

Describe the problem

This is a re-occurrence of #9072, except that the solutions mentioned there (not using clang as the CUDA compiler, using CommandLineTools) do not resolve the problem.

To configure, I selected the following:

You have bazel 0.13.1-homebrew installed.
Please specify the location of python. [Default is /usr/local/opt/python@2/bin/python2.7]: 


Found possible Python library paths:
  /usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages]

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]:  
Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: 
Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: 
Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: 
Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: 
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: 
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: 
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 9.2


Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1.4


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]3.0


Do you want to use clang as CUDA compiler? [y/N]: 
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
Configuration finished

To build, I try:

bazel build --config=opt --config=cuda --save_temps --explain=explain.txt --verbose_explanations --verbose_failures --linkopt=-Wl,-rpath,/usr/local/cuda/lib //tensorflow/tools/pip_package:build_pip_package

However, this results in the error:

Starting local Bazel server and connecting to it...
............
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
ERROR: Inconsistent crosstool configuration; no toolchain corresponding to 'local_darwin' found for cpu 'darwin'
INFO: Elapsed time: 0.903s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (2 packages loaded)

The output of xcode-select -p is:

/Library/Developer/CommandLineTools

The output of /usr/bin/gcc --version is:

Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

I am able to build & run the ‘deviceQuery’ CUDA SDK sample without issue.

Source code / logs

The only local changes from e365deab1333005c8aa186632f160c1bfd4485f8 I have are:

diff --git a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
index a561d918bd..46c91b4511 100644
--- a/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
+++ b/tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
@@ -69,7 +69,7 @@ __global__ void concat_variable_kernel(
   IntType num_inputs = input_ptr_data.size;
 
   // verbose declaration needed due to template
-  extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+  extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
   IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
 
   if (useSmem) {
diff --git a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
index 5390222b3a..fcbd733614 100644
--- a/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
@@ -172,7 +172,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNHWCSmall(
     const DepthwiseArgs args, const T* input, const T* filter, T* output) {
   assert(CanLaunchDepthwiseConv2dGPUSmall(args));
   // Holds block plus halo and filter data for blockDim.x depths.
-  extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+  extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
   T* const shared_data = reinterpret_cast<T*>(shared_memory);
 
   const int num_batches = args.batch;
@@ -452,7 +452,7 @@ __global__ __launch_bounds__(1024, 2) void DepthwiseConv2dGPUKernelNCHWSmall(
     const DepthwiseArgs args, const T* input, const T* filter, T* output) {
   assert(CanLaunchDepthwiseConv2dGPUSmall(args));
   // Holds block plus halo and filter data for blockDim.z depths.
-  extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+  extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
   T* const shared_data = reinterpret_cast<T*>(shared_memory);
 
   const int num_batches = args.batch;
@@ -1118,7 +1118,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNHWCSmall(
     const DepthwiseArgs args, const T* output, const T* input, T* filter) {
   assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.z));
   // Holds block plus halo and filter data for blockDim.x depths.
-  extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+  extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
   T* const shared_data = reinterpret_cast<T*>(shared_memory);
 
   const int num_batches = args.batch;
@@ -1388,7 +1388,7 @@ __launch_bounds__(1024, 2) void DepthwiseConv2dBackpropFilterGPUKernelNCHWSmall(
     const DepthwiseArgs args, const T* output, const T* input, T* filter) {
   assert(CanLaunchDepthwiseConv2dBackpropFilterGPUSmall(args, blockDim.x));
   // Holds block plus halo and filter data for blockDim.z depths.
-  extern __shared__ __align__(sizeof(T)) unsigned char shared_memory[];
+  extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char shared_memory[];
   T* const shared_data = reinterpret_cast<T*>(shared_memory);
 
   const int num_batches = args.batch;
diff --git a/tensorflow/core/kernels/split_lib_gpu.cu.cc b/tensorflow/core/kernels/split_lib_gpu.cu.cc
index 393818730b..58a1294005 100644
--- a/tensorflow/core/kernels/split_lib_gpu.cu.cc
+++ b/tensorflow/core/kernels/split_lib_gpu.cu.cc
@@ -121,7 +121,7 @@ __global__ void split_v_kernel(const T* input_ptr,
   int num_outputs = output_ptr_data.size;
 
   // verbose declaration needed due to template
-  extern __shared__ __align__(sizeof(T)) unsigned char smem[];
+  extern __shared__ __align__(sizeof(T) > 16 ? sizeof(T) : 16) unsigned char smem[];
   IntType* smem_col_scan = reinterpret_cast<IntType*>(smem);
 
   if (useSmem) {
diff --git a/third_party/gpus/cuda/BUILD.tpl b/third_party/gpus/cuda/BUILD.tpl
index 2a37c65bc7..43446dd99b 100644
--- a/third_party/gpus/cuda/BUILD.tpl
+++ b/third_party/gpus/cuda/BUILD.tpl
@@ -110,7 +110,7 @@ cc_library(
         ".",
         "cuda/include",
     ],
-    linkopts = ["-lgomp"],
+    #linkopts = ["-lgomp"],
     linkstatic = 1,
     visibility = ["//visibility:public"],
 )
diff --git a/third_party/toolchains/gpus/cuda/BUILD b/third_party/toolchains/gpus/cuda/BUILD
index 4cb8380938..d025c4f3aa 100644
--- a/third_party/toolchains/gpus/cuda/BUILD
+++ b/third_party/toolchains/gpus/cuda/BUILD
@@ -115,7 +115,7 @@ cc_library(
         ".",
         "cuda/include",
     ],
-    linkopts = ["-lgomp"],
+    #linkopts = ["-lgomp"],
     linkstatic = 1,
     visibility = ["//visibility:public"],
 )

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 15 (4 by maintainers)

Most upvoted comments

@saudet I was able to build with Xcode 9.4.1 installed. If you are seeing something like Symbol not found: _ncclAllReduce, try to copy nccl.h from Bazel’s temp directories to tensorflow/contrib/nccl/kernels; see this comment.

ghost on Jun 22, 2018

Ah, that’s because TensorFlow doesn’t seem to build with Xcode 7.3 anymore, but it still works fine with Xcode 8.3. For reference, here is my collection of patches and workarounds to build for Mac with CUDA that work with TensorFlow 1.9.0-rc0: https://github.com/bytedeco/javacpp-presets/blob/master/tensorflow/cppbuild.sh#L131

saudet on Jun 13, 2018

Trying git bisect, I found that the error was introduced with 6e6dcf4ed62324069631cdcd23e7187ae75f6340

On macOS /usr/bin/gcc is a frontend for Clang. I am not sure if the referenced changes to -pie are going to apply.

ghost on Jun 9, 2018

Got v1.8.0 working with cuda 9.2 and cudnn 7.1, followed steps in this link.

This is the wheel I built. (Note, I did not get nccl working, as I am not using multi-GPU I commented “import nccl” out after installing)

bingo619 on Jun 7, 2018