datatable: segfault on Ubuntu 20.04 when in combination with LightGBM
# on host
cd /tmp/
wget https://files.slack.com/files-pri/T0329MHH6-F013VU6RW94/download/dt_lgb.gz?pub_secret=fb7b5f3988
mv 'dt_lgb.gz?pub_secret=fb7b5f3988' dt_lgb.gz
tar xfz dt_lgb.gz
docker pull ubuntu:20.04
docker run -t -v `pwd`:/tmp --security-opt seccomp=unconfined -i ubuntu:20.04 /bin/bash
# on Ubuntu 20.04
chmod 1777 /tmp
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y software-properties-common
add-apt-repository -y ppa:deadsnakes/ppa
apt-get update
apt-get install -y python3.6 python3.6-dev virtualenv libgomp1 gdb vim valgrind
# repro failure
virtualenv -p python3.6 blah
source blah/bin/activate
pip install datatable
pip install lightgbm
pip install pandas
cd /tmp/
python lgb_prefit_df669346-4e47-4ecf-b131-0838ae8f9474.py
fails with:
/blah/lib/python3.6/site-packages/lightgbm/basic.py:1295: UserWarning: categorical_feature in Dataset is overridden.
New categorical_feature is []
'New categorical_feature is {}'.format(sorted(list(categorical_feature))))
/blah/lib/python3.6/site-packages/lightgbm/basic.py:842: UserWarning: categorical_feature keyword has been found in `params` and will be ignored.
Please use categorical_feature argument of the Dataset constructor to pass this parameter.
.format(key))
Segmentation fault (core dumped)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 27 (24 by maintainers)
For anyone else who happens to stumble upon this, I ran into a similar/same issue that occurred because tensorflow was binding certain
std::symbols from another library (apache/tvm) instead of libstdc++ as one might expect. In particular, there appears to have been some changes related to the random number logic in more recent libstdc++ versions. I was able to track this down by running my program withenv LD_DEBUG=bindings <program>and noticing that some symbols were being bound to the wrong library.More details: I was building tvm inside a https://github.com/pypa/manylinux docker container. Manylinux uses Redhat’s Developer Toolset to maintain compatibility with specific libstdc++ versions. My understanding is that sometimes devtoolset will statically link
std::symbols if such symbols do not exist in libstdc++. It appears some of the libstdc++ features related to random numbers are relatively new and not present in older libstdc++, and I was ending up withstd::random_devicecoming from libtvm.so instead oflibstdc++.so. Even this shouldn’t happen because the symbols from the libraries should be isolated from each other, but TVM loads itself usingRTLD_GLOBAL, polluting the global namespace - https://github.com/apache/tvm/blob/dfe4cebbdadab3d4e6e6ba3951276a51a4ffeaf6/python/tvm/_ffi/base.py#L57.