scikit-learn: test_omp_cv fails with MKL and AVX-512

In newly released scikit-learn 0.20.1 (and actually in 0.20.0) test_omp_cv case fails during running test suite. Any help appreciated!

_______________________________ test_omp_cv _______________________________

    def test_omp_cv():
        y_ = y[:, 0]
        gamma_ = gamma[:, 0]
        ompcv = OrthogonalMatchingPursuitCV(normalize=True, fit_intercept=False,
                                            max_iter=10, cv=5)
        ompcv.fit(X, y_)
>       assert_equal(ompcv.n_nonzero_coefs_, n_nonzero_coefs)

/usr/local/lib/python3.6/dist-packages/scikit_learn-0.20.1-py3.6-linux-x86_64.egg/sklearn/linear_model/tests/test_omp.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib/python3.6/unittest/case.py:829: in assertEqual
    assertion_func(first, second, msg=msg)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <sklearn.utils._unittest_backport.TestCase testMethod=__init__>, first = 6, second = 5, msg = '6 != 5'

    def _baseAssertEqual(self, first, second, msg=None):
        """The default assertEqual implementation, not type specific."""
        if not first == second:
            standardMsg = '%s != %s' % _common_shorten_repr(first, second)
            msg = self._formatMessage(msg, standardMsg)
>           raise self.failureException(msg)
E           AssertionError: 6 != 5

/usr/lib/python3.6/unittest/case.py:822: AssertionError

Versions

System: python: 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0] executable: /usr/bin/python3 machine: Linux-4.15.0-39-generic-x86_64-with-Ubuntu-18.04-bionic

BLAS: macros: SCIPY_MKL_H=None, HAVE_CBLAS=None lib_dirs: /opt/intel/mkl/lib/intel64 cblas_libs: mkl_rt, pthread

Python deps: pip: 18.1 setuptools: 40.6.2 sklearn: 0.20.1 numpy: 1.15.4 scipy: 1.1.0 Cython: 0.29 pandas: None

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 27 (14 by maintainers)

Most upvoted comments

@rth, I can confirm that the test is not failing anymore. Thanks for fixing it!

If it is helpful - I tried to test Windows build (with MKL) from https://www.lfd.uci.edu/~gohlke/pythonlibs on the same PC. OMP failure occurs as in Linux test

I can not reproduce this on my system. The test_omp tests pass.

I re-ran the tests on a Xeon W-2155 CPU with AVX-512 and could reproduce the issue.

@rth

Versions

>>> sklearn.show_versions()

System:
    python: 3.7.3 (default, Jun 11 2019, 01:11:15)  [GCC 6.3.0 20170516]
executable: /usr/local/bin/python
   machine: Linux-4.9.0-8-amd64-x86_64-with-debian-9.9

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /opt/intel/mkl/lib/intel64/
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 19.1.1
setuptools: 41.0.1
   sklearn: 0.20.3
     numpy: 1.15.4
     scipy: 1.2.1
    Cython: 0.29.10
    pandas: None

Traceback

$ pytest --pyargs sklearn --disable-warnings -k "test_omp_cv"
============================= test session starts ==============================
platform linux -- Python 3.7.3, pytest-5.0.0, py-1.8.0, pluggy-0.12.0
rootdir: /builds/tomotech/build/python-numpy
collected 10193 items / 10192 deselected / 1 selected

linear_model/tests/test_omp.py F                                         [100%]

=================================== FAILURES ===================================
_________________________________ test_omp_cv __________________________________

    def test_omp_cv():
        y_ = y[:, 0]
        gamma_ = gamma[:, 0]
        ompcv = OrthogonalMatchingPursuitCV(normalize=True, fit_intercept=False,
                                            max_iter=10, cv=5)
        ompcv.fit(X, y_)
>       assert_equal(ompcv.n_nonzero_coefs_, n_nonzero_coefs)

/usr/local/lib/python3.7/site-packages/sklearn/linear_model/tests/test_omp.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.7/unittest/case.py:839: in assertEqual
    assertion_func(first, second, msg=msg)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <sklearn.utils._unittest_backport.TestCase testMethod=__init__>
first = 6, second = 5, msg = '6 != 5'

    def _baseAssertEqual(self, first, second, msg=None):
        """The default assertEqual implementation, not type specific."""
        if not first == second:
            standardMsg = '%s != %s' % _common_shorten_repr(first, second)
            msg = self._formatMessage(msg, standardMsg)
>           raise self.failureException(msg)
E           AssertionError: 6 != 5

/usr/local/lib/python3.7/unittest/case.py:832: AssertionError
============ 1 failed, 10192 deselected, 5 warnings in 8.00 seconds ============

Dockerfile

ARG FROM_IMAGE=python:3.7-slim-stretch
FROM $FROM_IMAGE

# Install MKL.
RUN set -x && \
    BUILD_DEPS="apt-transport-https curl" && \
    command -v gpg > /dev/null || BUILD_DEPS="gnupg $BUILD_DEPS" && \
    apt-get update -q && \
    apt-get install -yq --no-install-recommends $BUILD_DEPS && \
    curl https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB | apt-key add - && \
    echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list && \
    apt-get update -q && \
    apt-get install -yq --no-install-recommends intel-mkl-64bit-2019.0-045 && \
    update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so \
        libblas.so-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50 && \
    update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3 \
        libblas.so.3-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50 && \
    update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so \
        liblapack.so-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50 && \
    update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 \
        liblapack.so.3-x86_64-linux-gnu /opt/intel/mkl/lib/intel64/libmkl_rt.so 50 && \
    echo "/opt/intel/mkl/lib/intel64" >> /etc/ld.so.conf.d/intel.conf && \
    ldconfig && \
    rm /etc/apt/sources.list.d/intel-mkl.list && \
    apt-get remove -yq $BUILD_DEPS && \
    apt-get autoremove -yq && \
    apt-get clean -q && \
    rm -rf /var/lib/apt/lists/* && \
    rm -rf /tmp/*

# Install Cython (numpy's dependency).
RUN pip install --no-cache-dir Cython && \
    find / -name '*.py[co]' -delete && \
    rm -rf /tmp/*

# Compile and install numpy, scipy and scikit-learn with MKL.
ARG NUMPY_VERSION=1.15.4
ARG SCIPY_VERSION=1.2.1
ARG SKLEARN_VERSION=0.20.3
RUN set -x && \
    echo "STEP 1: Preparing." && \
    apt-get update -q && \
    BUILD_DEPS="git gcc g++ gfortran" && \
    RUN_DEPS="libgomp1 libgfortran3" && \
    apt-get install -yq --no-install-recommends $BUILD_DEPS $RUN_DEPS && \
    set +x && \
    COMPILERVARS_ARCHITECTURE=intel64 . /opt/intel/bin/compilervars.sh && \
    set -x && \
    echo "STEP 2: Installing numpy." && \
    git clone --depth=1 --branch=v$NUMPY_VERSION \
        https://github.com/numpy/numpy.git /tmp/numpy && \
    cd /tmp/numpy && \
    cp site.cfg.example site.cfg && \
    echo "[mkl]" >> site.cfg && \
    echo "include_dirs = /opt/intel/mkl/include/intel64/" >> site.cfg && \
    echo "library_dirs = /opt/intel/mkl/lib/intel64/" >> site.cfg && \
    echo "mkl_libs = mkl_rt" >> site.cfg && \
    echo "lapack_libs =" >> site.cfg && \
    export CFLAGS='-fopenmp -O3 -march=core2 -mtune=native -ftree-vectorize' && \
    export LDFLAGS='-lm -lpthread -lgomp' && \
    python setup.py build -j 3 --fcompiler=gnu95 && \
    python setup.py install && \
    echo "STEP 3: Installing scipy." && \
    git clone --branch=v$SCIPY_VERSION --depth=1 \
        https://github.com/scipy/scipy.git /tmp/scipy && \
    cd /tmp/scipy && \
    export CFLAGS='-fopenmp -O3 -march=core2 -mtune=native -ftree-vectorize' && \
    export LDFLAGS='-lm -lpthread -lgomp -shared' && \
    python setup.py build -j 3 && \
    python setup.py install && \
    echo "STEP 4: Installing scikit-learn." && \
    git clone --branch=$SKLEARN_VERSION --depth=1 \
        https://github.com/scikit-learn/scikit-learn.git /tmp/sklearn && \
    cd /tmp/sklearn && \
    export CFLAGS='-fopenmp -O3 -march=core2 -mtune=native -ftree-vectorize' && \
    export LDFLAGS='-lm -lpthread -lgomp -shared' && \
    python setup.py build -j 3 && \
    python setup.py install && \
    echo "STEP 5: Clean up." && \
    cd / && \
    apt-get remove -yq $BUILD_DEPS && \
    apt-get autoremove -yq && \
    apt-get clean -q && \
    rm -rf /var/lib/apt/lists/* && \
    find / -name '*.py[co]' -delete && \
    rm -rf /tmp/*

I build the dockerfile on an EC2 m5 instance. When using that docker image to run the tests on an m5 instance, they will fail. The same image will work on an m4 instance or my laptop, both of which don’t support AVX-512.

Please let me know if there’s anything else I can provide to help diagnose the issue. If you don’t have access to a machine with AVX-512, I think I could also give you access to one.