datatable: segfault on Ubuntu 20.04 when in combination with LightGBM

# on host
cd /tmp/
wget https://files.slack.com/files-pri/T0329MHH6-F013VU6RW94/download/dt_lgb.gz?pub_secret=fb7b5f3988
mv 'dt_lgb.gz?pub_secret=fb7b5f3988' dt_lgb.gz
tar xfz dt_lgb.gz
docker pull ubuntu:20.04
docker run -t -v `pwd`:/tmp --security-opt seccomp=unconfined -i ubuntu:20.04 /bin/bash

# on Ubuntu 20.04
chmod 1777 /tmp
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y software-properties-common
add-apt-repository -y ppa:deadsnakes/ppa
apt-get update
apt-get install -y python3.6 python3.6-dev virtualenv libgomp1 gdb vim valgrind

# repro failure
virtualenv -p python3.6 blah
source blah/bin/activate
pip install datatable
pip install lightgbm
pip install pandas
cd /tmp/
python lgb_prefit_df669346-4e47-4ecf-b131-0838ae8f9474.py

fails with:

/blah/lib/python3.6/site-packages/lightgbm/basic.py:1295: UserWarning: categorical_feature in Dataset is overridden.
New categorical_feature is []
  'New categorical_feature is {}'.format(sorted(list(categorical_feature))))
/blah/lib/python3.6/site-packages/lightgbm/basic.py:842: UserWarning: categorical_feature keyword has been found in `params` and will be ignored.
Please use categorical_feature argument of the Dataset constructor to pass this parameter.
  .format(key))
Segmentation fault (core dumped)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 27 (24 by maintainers)

Most upvoted comments

For anyone else who happens to stumble upon this, I ran into a similar/same issue that occurred because tensorflow was binding certain std:: symbols from another library (apache/tvm) instead of libstdc++ as one might expect. In particular, there appears to have been some changes related to the random number logic in more recent libstdc++ versions. I was able to track this down by running my program with env LD_DEBUG=bindings <program> and noticing that some symbols were being bound to the wrong library.

More details: I was building tvm inside a https://github.com/pypa/manylinux docker container. Manylinux uses Redhat’s Developer Toolset to maintain compatibility with specific libstdc++ versions. My understanding is that sometimes devtoolset will statically link std:: symbols if such symbols do not exist in libstdc++. It appears some of the libstdc++ features related to random numbers are relatively new and not present in older libstdc++, and I was ending up with std::random_device coming from libtvm.so instead of libstdc++.so. Even this shouldn’t happen because the symbols from the libraries should be isolated from each other, but TVM loads itself using RTLD_GLOBAL, polluting the global namespace - https://github.com/apache/tvm/blob/dfe4cebbdadab3d4e6e6ba3951276a51a4ffeaf6/python/tvm/_ffi/base.py#L57.