LightGBM: R package install with GPU support fails
This used to work:
FROM nvidia/cuda:11.0-devel-ubuntu20.04
RUN apt-get update && \
DEBIAN_FRONTEND="noninteractive" apt-get install -y software-properties-common apt-transport-https
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
add-apt-repository 'deb [arch=amd64] https://cran.rstudio.com/bin/linux/ubuntu focal-cran40/' && \
apt-get update && \
apt-get install -y r-base
RUN apt-get install -y git wget libcurl4-openssl-dev default-jdk-headless libssl-dev libxml2-dev cmake
ENV MAKE="make -j$(nproc)"
RUN R -e 'install.packages(c("R6","data.table","jsonlite"), repos = "https://cran.rstudio.com/")'
RUN apt-get install -y libboost-dev libboost-system-dev libboost-filesystem-dev ocl-icd-opencl-dev opencl-headers clinfo
RUN mkdir -p /etc/OpenCL/vendors && \
echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd ## otherwise lightgm segfaults at runtime (compiles fine without it)
RUN git clone --recursive https://github.com/microsoft/LightGBM && \
cd LightGBM && \
Rscript build_r.R --use-gpu
Now, I get this error:
Cloning into 'LightGBM'...
Submodule 'include/boost/compute' (https://github.com/boostorg/compute) registered for path 'compute'
Submodule 'eigen' (https://gitlab.com/libeigen/eigen.git) registered for path 'eigen'
Submodule 'external_libs/fast_double_parser' (https://github.com/lemire/fast_double_parser.git) registered for path 'external_libs/fast_double_parser'
Submodule 'external_libs/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'external_libs/fmt'
Cloning into '/LightGBM/compute'...
Cloning into '/LightGBM/eigen'...
Cloning into '/LightGBM/external_libs/fast_double_parser'...
Cloning into '/LightGBM/external_libs/fmt'...
Submodule path 'compute': checked out '36c89134d4013b2e5e45bc55656a18bd6141995a'
Submodule path 'eigen': checked out '8ba1b0f41a7950dc3e1d4ed75859e36c73311235'
Submodule path 'external_libs/fast_double_parser': checked out 'ace60646c02dc54c57f19d644e49a61e7e7758ec'
Submodule 'benchmark/dependencies/abseil-cpp' (https://github.com/abseil/abseil-cpp.git) registered for path 'external_libs/fast_double_parser/benchmarks/dependencies/abseil-cpp'
Submodule 'benchmark/dependencies/double-conversion' (https://github.com/google/double-conversion.git) registered for path 'external_libs/fast_double_parser/benchmarks/dependencies/double-conversion'
Cloning into '/LightGBM/external_libs/fast_double_parser/benchmarks/dependencies/abseil-cpp'...
Cloning into '/LightGBM/external_libs/fast_double_parser/benchmarks/dependencies/double-conversion'...
Submodule path 'external_libs/fast_double_parser/benchmarks/dependencies/abseil-cpp': checked out 'd936052d32a5b7ca08b0199a6724724aea432309'
Submodule path 'external_libs/fast_double_parser/benchmarks/dependencies/double-conversion': checked out 'f4cb2384efa55dee0e6652f8674b05763441ab09'
Submodule path 'external_libs/fmt': checked out 'cc09f1a6798c085c325569ef466bcdcffdc266d4'
* checking for file '/LightGBM/lightgbm_r/DESCRIPTION' ... OK
* preparing 'lightgbm':
* checking DESCRIPTION meta-information ... OK
* cleaning src
Warning in system2(command, args, stdout = NULL, stderr = NULL, ...) :
error in running command
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
WARNING: directory 'lightgbm/src/compute/test' is empty
* looking to see if a 'data/datalist' file should be added
* building 'lightgbm_3.1.1.99.tar.gz'
* installing to library '/usr/local/lib/R/site-library'
* installing *source* package 'lightgbm' ...
** using staged installation
** libs
installing via 'install.libs.R' to /usr/local/lib/R/site-library/00LOCK-lightgbm/00new/lightgbm
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- R version passed into FindLibR.cmake: 4.0.3
-- Found LibR: /usr/lib/R
-- LIBR_EXECUTABLE: /usr/bin/R
-- LIBR_INCLUDE_DIRS: /usr/share/R/include
-- LIBR_CORE_LIBRARY: /usr/lib/R/lib/libR.so
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - found
CMake Error at /usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message):
Could NOT find OpenCL (missing: OpenCL_LIBRARY) (found version "2.2")
Call Stack (most recent call first):
/usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:393 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.16/Modules/FindOpenCL.cmake:150 (find_package_handle_standard_args)
CMakeLists.txt:138 (find_package)
-- Configuring incomplete, errors occurred!
See also "/tmp/RtmpvcXiAX/R.INSTALL14755eba078/lightgbm/src/build/CMakeFiles/CMakeOutput.log".
Error in .run_shell_command("cmake", c(cmake_args, "..")) :
Command failed with exit code: 1
* removing '/usr/local/lib/R/site-library/lightgbm'
Error in .run_shell_command(install_cmd, install_args) :
Command failed with exit code: 1
Execution halted
The command '/bin/sh -c git clone --recursive https://github.com/microsoft/LightGBM && cd LightGBM && Rscript build_r.R --use-gpu' returned a non-zero code: 1
If I build the docker image with the last RUN entry commented out:
FROM nvidia/cuda:11.0-devel-ubuntu20.04
RUN apt-get update && \
DEBIAN_FRONTEND="noninteractive" apt-get install -y software-properties-common apt-transport-https
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
add-apt-repository 'deb [arch=amd64] https://cran.rstudio.com/bin/linux/ubuntu focal-cran40/' && \
apt-get update && \
apt-get install -y r-base
RUN apt-get install -y git wget libcurl4-openssl-dev default-jdk-headless libssl-dev libxml2-dev cmake
ENV MAKE="make -j$(nproc)"
RUN R -e 'install.packages(c("R6","data.table","jsonlite"), repos = "https://cran.rstudio.com/")'
RUN apt-get install -y libboost-dev libboost-system-dev libboost-filesystem-dev ocl-icd-opencl-dev opencl-headers clinfo
RUN mkdir -p /etc/OpenCL/vendors && \
echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd ## otherwise lightgm segfaults at runtime (compiles fine without it)
#RUN git clone --recursive https://github.com/microsoft/LightGBM && \
# cd LightGBM && \
# Rscript build_r.R --use-gpu
with
sudo docker build -t gbmperf_gpu .
and then run it:
sudo nvidia-docker run --rm -ti gbmperf_gpu /bin/bash
then I can run things manually:
git clone --recursive https://github.com/microsoft/LightGBM && \
cd LightGBM && \
Rscript build_r.R --use-gpu
gives the same error.
However, just compiling lightgbm (not the R package) seems fine:
git clone --recursive https://github.com/microsoft/LightGBM && \
cd LightGBM && mkdir build && cd build && cmake -DUSE_GPU=1 .. && make -j4
as here:
...
Submodule path 'external_libs/fast_double_parser/benchmarks/dependencies/abseil-cpp': checked out 'd936052d32a5b7ca08b0199a6724724aea432309'
Submodule path 'external_libs/fast_double_parser/benchmarks/dependencies/double-conversion': checked out 'f4cb2384efa55dee0e6652f8674b05763441ab09'
Submodule path 'external_libs/fmt': checked out 'cc09f1a6798c085c325569ef466bcdcffdc266d4'
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - found
-- Found OpenCL: /usr/lib/x86_64-linux-gnu/libOpenCL.so (found version "2.2")
-- OpenCL include directory: /usr/include
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found suitable version "1.71.0", minimum required is "1.56.0") found components: filesystem system
-- Performing Test MM_PREFETCH
-- Performing Test MM_PREFETCH - Success
-- Using _mm_prefetch
-- Performing Test MM_MALLOC
-- Performing Test MM_MALLOC - Success
-- Using _mm_malloc
-- Configuring done
-- Generating done
-- Build files have been written to: /LightGBM/LightGBM/build
make[1]: warning: -j0 forced in submake: resetting jobserver mode.
Scanning dependencies of target lightgbm
Scanning dependencies of target _lightgbm
[ 1%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/boosting.cpp.o
[ 2%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/gbdt.cpp.o
[ 4%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/gbdt.cpp.o
[ 7%] Building CXX object CMakeFiles/lightgbm.dir/src/boosting/boosting.cpp.o
[ 7%] Building CXX object CMakeFiles/lightgbm.dir/src/main.cpp.o
[ 8%] Building CXX object CMakeFiles/_lightgbm.dir/src/boosting/prediction_early_stop.cpp.o
[ 10%] Building CXX object CMakeFiles/lightgbm.dir/src/application/application.cpp.o
...
though I also see
/usr/include/CL/cl_version.h:34:104: note: #pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)
34 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)")
but it compiles anyway:
[ 98%] Linking CXX shared library ../lib_lightgbm.so
[100%] Linking CXX executable ../lightgbm
[100%] Built target _lightgbm
[100%] Built target lightgbm
So there must be something in the R package(?) cc @jameslamb
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 29 (17 by maintainers)
Commits related to this issue
- [R-package] Add GPU install options (fixes #3765) — committed to microsoft/LightGBM by jameslamb 3 years ago
or use
nvidia-smišWell, as I corrected myself later, the
sedversion actually does not work properly anymore either (it compiles, but it does not add GPU support actually).Yeah, my Dockerfile has a history of additions over the years (and hacks like the
echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icdthing), Iāll see if I can clean it up with your suggestions @StrikerRUS.However, lightgbm compiles fine outside the R package, so it seems itās only the R package that gets confused about OpenCL.
Thanks @StrikerRUS , I fixed it now. Yeah, strange indeed it was compiling with the
==as well.@szilard Iām afraid you have a typo (duplicated
=sign) in the commit youāve linked:Quite strange that even with typo compilation succeed.
Thanks @jameslamb for fix and merging into LightGBM master. I changed the Dockerfile in my repo GBM-perf to take advantage of this fix (replaced the
sedhack with flags to the build script): https://github.com/szilard/GBM-perf/commit/3b56bf0b474edd5dcf8039c9ddd86cddb9c1d845 Thanks.Thanks to both of you for all the great information, and a nice reproducible example!
Iāve proposed what I think could be a fix, in https://github.com/microsoft/LightGBM/pull/3779. It wouldnāt ājust workā, but would at least allow you to pass in these paths as command-line args like you can in the Python package, so no one would need to use
sedto re-writeinstall.libs.R.Thanks for such nice reproducible examples @szilard ! I can look into this this weekend, and probably expose more options via the
build_r.Rcommand-line args, so you donāt have to use sed.Nice, given that the error happens on non-GPU machine! Indeed good sign!
But please note that successfully compiled GPU version and using
device_type='gpu'in params may still result in training on CPU. This can occur with CPU that have onboard graphics and some combination of system-wide default platform and device (refer togpu_platform_idandgpu_device_id). So to be 100% sure LightGBM uses real GPU please take a look at training log and find this lineAll this strange, because last time I ran the benchmarks (September 2020) it was all working.
@jameslamb I believe R-package needs the same additional command line options for GPU-version as our Python-package:
https://github.com/microsoft/LightGBM/tree/master/python-package#build-gpu-version