tensorflow: Source compilation fails on Musl system on multiple places

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Alpine Linux edge
  • TensorFlow installed from (source or binary): source
  • TensorFlow version: 2.3.1
  • Python version: 3.8.6
  • Installed using virtualenv? pip? conda?: source
  • Bazel version (if compiling from source): 3.5.0
  • GCC/Compiler version (if compiling from source): 10.2.1_pre1

Describe the problem

I’m trying to compile TensorFlow from source on Alpine Linux which is a Musl based system. It fails on several points however:

ERROR: /builds/PureTryOut/aports/testing/tensorflow/src/tensorflow-2.3.1/tensorflow/core/platform/default/BUILD:75:11: Couldn't build file tensorflow/core/platform/default/_objs/env/0/env.pic.o: C++ compilation of rule '//tensorflow/core/platform/default:env' failed (Exit 1)
tensorflow/core/platform/default/env.cc: In member function 'virtual bool tensorflow::{anonymous}::PosixEnv::GetCurrentThreadName(std::string*)':
tensorflow/core/platform/default/env.cc:160:15: error: 'pthread_getname_np' was not declared in this scope; did you mean 'pthread_setname_np'?
  160 |     int res = pthread_getname_np(pthread_self(), buf, static_cast<size_t>(100));
      |               ^~~~~~~~~~~~~~~~~~
      |               pthread_setname_np
ERROR: /builds/PureTryOut/aports/testing/tensorflow/src/tensorflow-2.3.1/tensorflow/core/platform/s3/BUILD:44:11: Couldn't build file tensorflow/core/platform/s3/_objs/aws_crypto/aws_crypto.pic.o: C++ compilation of rule '//tensorflow/core/platform/s3:aws_crypto' failed (Exit 1)
tensorflow/core/platform/s3/aws_crypto.cc: In member function 'virtual Aws::Utils::Crypto::HashResult tensorflow::AWSSha256HMACOpenSSLImpl::Calculate(const ByteBuffer&, const ByteBuffer&)':
tensorflow/core/platform/s3/aws_crypto.cc:38:14: error: aggregate 'HMAC_CTX ctx' has incomplete type and cannot be defined
   38 |     HMAC_CTX ctx;
      |              ^~~
tensorflow/core/platform/s3/aws_crypto.cc:39:5: error: 'HMAC_CTX_init' was not declared in this scope; did you mean 'HMAC_CTX_new'?
   39 |     HMAC_CTX_init(&ctx);
      |     ^~~~~~~~~~~~~
      |     HMAC_CTX_new
tensorflow/core/platform/s3/aws_crypto.cc:45:5: error: 'HMAC_CTX_cleanup' was not declared in this scope
   45 |     HMAC_CTX_cleanup(&ctx);
      |     ^~~~~~~~~~~~~~~~

Provide the exact sequence of commands / steps that you executed before running into the problem The build script can be found at https://gitlab.alpinelinux.org/alpine/aports/-/raw/4cf626b10d2f4700cc5e5e9e7536061137c8c6a1/testing/tensorflow/APKBUILD. Note that prepare() there is called before build() and the environment variables set carry over.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 42 (21 by maintainers)

Most upvoted comments

Hey there. I successfully built TF 2.8.0 on Alpine Linux with these changes: https://github.com/grebaza/tensorflow/commit/39381a1f453da0f37b7b1f94c2e64ea2736de4e1. I included the backtrace library (libexecinfo) and hardcoded POSIX defines. Hope this helps.

Well the good news is that there are fewer copies of these build files now. They’ve been upstreamed at https://github.com/llvm/llvm-project/tree/main/utils/bazel. Additionally, we’ve started defining things based on C preprocessor macros in config.h, which are generally way easier to use than Bazel platform selects (with the limitation that we can’t execute arbitrary code like try-compile), so if you want to move HAVE_MALLINFO and the like out of config.bzl and into the config headers themselves, then go for it.

Hmm, now this is a grpc issue.

They also use Bazel, can you file an issue at https://github.com/grpc/grpc please?

Hey there. The version of LLVM build files used by TF is not the same as the one at https://github.com/google/llvm-bazel. Both versions hardcode HAVE_MALLINFO=1, however, so yes unsurprising this doesn’t work. Since the goal of those build files does not include building LLVM in every configuration it supports, configurability is added in an ad-hoc manner. If you want to be able to build TF on this system, you’ll need to edit their build files. In particular at https://github.com/tensorflow/tensorflow/blob/a5c4882d6c953c24f8451b866b984abe98a8fa7b/third_party/llvm/llvm.bzl#L248

If you’d like to send a patch to llvm-bazel, that would also be welcome. The relevant configuration is here: https://github.com/google/llvm-bazel/blob/4c8b546e53eebc708c77ba19a2110926a8732642/llvm-bazel/llvm-project-overlay/llvm/config.bzl#L39. We do not yet have autodetection of library availability.

Yes having all these different copies and versions of the build files is awful, and we’re trying to consolidate them.

If something being alpine is detectable with Bazel config settings, that would be the easiest way. https://github.com/bazelbuild/bazel/blob/master/src/conditions/BUILD shows the ones already available as select targets in bazel_tools. If not, then if it’s detectable with https://docs.bazel.build/versions/master/configurable-attributes.html that would be the next easiest. After that, the options are basically creating your own flag to indicate musl usage or getting a bazel repo rule to do proper detection of system requirements. Bazel tends to be explicit rather than implicit, wanting to control things via the build invocation, not through environment detection because it aims for reproducibility (and prioritizes that way above ease of use, in my experience).

So all that said, I think you might be able to pass --copt=-UMALLINFO to Bazel. Not sure whether library defines or commandline copts take precedent.

Oh, it doesn’t use LLVM’s own build system at all? How strange, why would you ever do that?

😁 great question. Because getting a Bazel project to interact with a non-Bazel build system is…not great. In addition, Bazel is happier if it manages all the source files. With those issues combined, it turns out to be easier to just create a Bazel build configuration for LLVM 😕

I think it’s better to raise a bug on Bazel’s repo.

Well, I get different errors that way.

ERROR: /home/bart/.cache/bazel/_bazel_bart/f5cee94bd62e1be0d824a974250af519/external/llvm-project/llvm/BUILD:3766:11: C++ compilation of rule '@llvm-project//llvm:Support' failed (Exit 1)
In file included from external/llvm-project/llvm/lib/Support/Process.cpp:101:
external/llvm-project/llvm/lib/Support/Unix/Process.inc: In static member function 'static size_t llvm::sys::Process::GetMallocUsage()':
external/llvm-project/llvm/lib/Support/Unix/Process.inc:93:19: error: aggregate 'llvm::sys::Process::GetMallocUsage()::mallinfo mi' has incomplete type and cannot be defined
   93 |   struct mallinfo mi;
      |                   ^~
external/llvm-project/llvm/lib/Support/Unix/Process.inc:94:10: error: '::mallinfo' has not been declared
   94 |   mi = ::mallinfo();
      |          ^~~~~~~~
Target //tensorflow/tools/pip_package:build_pip_package failed to build