scikit-learn: Debian test failures (was test_preserve_trustworthiness_approximately fails on 32bit: AssertionError: 0.89166666666666661 not greater than 0.9)

building 0.19b2 on debian/ubuntus … still ongoing but I see consistent failure on Debian stretch (nd90, current stable) and testing (nd100), 32bit only (ok on amd64 build):

neurodebian@smaug ~/deb/builds/scikit-learn/0.19~b2-1 % grep -5 AssertionError: *build
scikit-learn_0.19~b2-1~nd100+1_i386.build-Traceback (most recent call last):
scikit-learn_0.19~b2-1~nd100+1_i386.build-  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
scikit-learn_0.19~b2-1~nd100+1_i386.build-    self.test(*self.arg)
scikit-learn_0.19~b2-1~nd100+1_i386.build-  File "/build/scikit-learn-0.19~b2/debian/tmp/usr/lib/python2.7/dist-packages/sklearn/manifold/tests/test_t_sne.py", line 247, in test_preserve_trustworthiness_approximately
scikit-learn_0.19~b2-1~nd100+1_i386.build-    assert_greater(t, 0.9)
scikit-learn_0.19~b2-1~nd100+1_i386.build:AssertionError: 0.89166666666666661 not greater than 0.9
scikit-learn_0.19~b2-1~nd100+1_i386.build-
scikit-learn_0.19~b2-1~nd100+1_i386.build-----------------------------------------------------------------------
scikit-learn_0.19~b2-1~nd100+1_i386.build-Ran 7969 tests in 285.883s
scikit-learn_0.19~b2-1~nd100+1_i386.build-
scikit-learn_0.19~b2-1~nd100+1_i386.build-FAILED (SKIP=73, failures=1)
--
scikit-learn_0.19~b2-1~nd90+1_i386.build-Traceback (most recent call last):
scikit-learn_0.19~b2-1~nd90+1_i386.build-  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
scikit-learn_0.19~b2-1~nd90+1_i386.build-    self.test(*self.arg)
scikit-learn_0.19~b2-1~nd90+1_i386.build-  File "/build/scikit-learn-0.19~b2/debian/tmp/usr/lib/python2.7/dist-packages/sklearn/manifold/tests/test_t_sne.py", line 247, in test_preserve_trustworthiness_approximately
scikit-learn_0.19~b2-1~nd90+1_i386.build-    assert_greater(t, 0.9)
scikit-learn_0.19~b2-1~nd90+1_i386.build:AssertionError: 0.89166666666666661 not greater than 0.9
scikit-learn_0.19~b2-1~nd90+1_i386.build-
scikit-learn_0.19~b2-1~nd90+1_i386.build-----------------------------------------------------------------------
scikit-learn_0.19~b2-1~nd90+1_i386.build-Ran 7969 tests in 288.113s
scikit-learn_0.19~b2-1~nd90+1_i386.build-
scikit-learn_0.19~b2-1~nd90+1_i386.build-FAILED (SKIP=73, failures=1)

in both cases python-numpy is 1:1.12.1-3 (i.e. 1.12.1 numpy) and passed ok with numpy 1.8.2 in Debian jessie.

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 86 (75 by maintainers)

Commits related to this issue

TST Improve SelectFromModel tests Should fix one of the issues in #9393 — committed to jnothman/scikit-learn by jnothman 7 years ago
TST Improve SelectFromModel tests (#9733) Should fix one of the issues in #9393 — committed to scikit-learn/scikit-learn by jnothman 7 years ago
TST Improve SelectFromModel tests (#9733) Should fix one of the issues in #9393 — committed to scikit-learn/scikit-learn by jnothman 7 years ago
TST Improve SelectFromModel tests (#9733) Should fix one of the issues in #9393 — committed to scikit-learn/scikit-learn by jnothman 7 years ago
TST Improve SelectFromModel tests (#9733) Should fix one of the issues in #9393 — committed to maskani-moh/scikit-learn by jnothman 7 years ago
TST Improve SelectFromModel tests (#9733) Should fix one of the issues in #9393 — committed to jwjohnson314/scikit-learn by jnothman 7 years ago
debian/patches/changeset_6c99d797d7c71d216503612a242bffb8d006582d.diff to avoid regression due to forgotten in the release fix (see https://github.com/scikit-learn/scikit-learn/issues/9393) — committed to yarikoptic/scikit-learn by yarikoptic 7 years ago
scikit-learn (0.19.1-3) unstable; urgency=medium * debian/patches/changeset_6c99d797d7c71d216503612a242bffb8d006582d.diff to avoid regression due to forgotten in the release fix (see https:... — committed to raspbian-packages/scikit-learn by yarikoptic 6 years ago

Most upvoted comments

Ok after playing extensively with different random seeds and platforms (mkl vs openblas PCA for the init) I think that 0.9 is just too strict. We could keep the 0.9 threshold and stabilize this test by:

running TSNE on larger datasets (in which case the trustworthiness score gets more stable)
running the tests several times with different random seeds and make an assertion on the median score.

However both approaches are too expensive in my opinion. While running my test with several hundred seeds on the original 50 samples random dataset I have never seen this score go below 0.87. So I think setting it to 0.85 should fix the issue. I will submit a PR.

ogrisel on Sep 20, 2017

Right. I see in the logs there an alarming number of fails for a final release 😦((

And none of them are about test_preserve_trustworthiness_approximately

======================================================================
FAIL: sklearn.feature_extraction.tests.test_feature_hasher.test_hasher_alternate_sign
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/<<PKGBUILDDIR>>/debian/tmp/usr/lib/python2.7/dist-packages/sklearn/utils/testing.py", line 291, in wrapper
    return fn(*args, **kwargs)
  File "/<<PKGBUILDDIR>>/debian/tmp/usr/lib/python2.7/dist-packages/sklearn/feature_extraction/tests/test_feature_hasher.py", line 122, in test_hasher_alternate_sign
    assert_true(len(Xt.data) < len(X[0]))
AssertionError: False is not true

----------------------------------------------------------------------

======================================================================
FAIL: sklearn.feature_selection.tests.test_from_model.test_feature_importances
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/<<PKGBUILDDIR>>/debian/tmp/usr/lib/python2.7/dist-packages/sklearn/utils/testing.py", line 667, in run_test
    return func(*args, **kwargs)
  File "/<<PKGBUILDDIR>>/debian/tmp/usr/lib/python2.7/dist-packages/sklearn/feature_selection/tests/test_from_model.py", line 72, in test_feature_importances
    assert_almost_equal(importances, importances_bis, decimal=4)
  File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 573, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 979, in assert_array_almost_equal
    precision=decimal)
  File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 4 decimals

(mismatch 70.0%)
 x: array([ 0.1537,  0.2294,  0.1825,  0.0667,  0.0485,  0.0587,  0.0643,
        0.0642,  0.066 ,  0.066 ])
 y: array([ 0.1527,  0.2294,  0.1822,  0.0675,  0.0483,  0.0587,  0.0648,
        0.0642,  0.0656,  0.0665])

----------------------------------------------------------------------
ON HURD-I386:
======================================================================
FAIL: sklearn.tests.test_multioutput.test_multi_output_classification_partial_fit_parallelism
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/<<PKGBUILDDIR>>/debian/tmp/usr/lib/python2.7/dist-packages/sklearn/tests/test_multioutput.py", line 171, in test_multi_output_classification_partial_fit_parallelism
    assert_false(est1 is est2)
AssertionError: True is not false

jnothman on Aug 13, 2017