scikit-learn: MNIST classification example seems to be broken

Description

I cannot reproduce the MNIST classfification using multinomial logistic + L1 example.

Steps/Code to Reproduce

I copied the whole code, as is, into a Jupyter Notebook.

Expected Results

No errors and similar results to the example’s web page.

Actual Results

Automatically created module for IPython interactive environment
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-5-e53e77eeb3b2> in <module>
     19 
     20 # Load data from https://www.openml.org/d/554
---> 21 X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
     22 
     23 random_state = check_random_state(0)

/opt/tljh/user/lib/python3.6/site-packages/sklearn/datasets/openml.py in fetch_openml(name, version, data_id, data_home, target_column, cache, return_X_y)
    478             "data_id.")
    479 
--> 480     data_description = _get_data_description_by_id(data_id, data_home)
    481     if data_description['status'] != "active":
    482         warn("Version {} of dataset {} is inactive, meaning that issues have "

/opt/tljh/user/lib/python3.6/site-packages/sklearn/datasets/openml.py in _get_data_description_by_id(data_id, data_home)
    293     error_message = "Dataset with data_id {} not found.".format(data_id)
    294     json_data = _get_json_content_from_openml_api(url, error_message, True,
--> 295                                                   data_home)
    296     return json_data['data_set_description']
    297 

/opt/tljh/user/lib/python3.6/site-packages/sklearn/datasets/openml.py in _get_json_content_from_openml_api(url, error_message, raise_if_error, data_home)
    131         else:
    132             return None
--> 133     json_data = json.loads(response.read().decode("utf-8"))
    134     response.close()
    135     return json_data

/opt/tljh/user/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352             parse_int is None and parse_float is None and
    353             parse_constant is None and object_pairs_hook is None and not kw):
--> 354         return _default_decoder.decode(s)
    355     if cls is None:
    356         cls = JSONDecoder

/opt/tljh/user/lib/python3.6/json/decoder.py in decode(self, s, _w)
    337 
    338         """
--> 339         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    340         end = _w(s, end).end()
    341         if end != len(s):

/opt/tljh/user/lib/python3.6/json/decoder.py in raw_decode(self, s, idx)
    355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
    358         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Versions

System
------
    python: 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)  [GCC 7.2.0]
executable: /opt/tljh/user/bin/python
   machine: Linux-4.15.0-38-generic-x86_64-with-debian-buster-sid

BLAS
----
    macros: 
  lib_dirs: 
cblas_libs: cblas

Python deps
-----------
       pip: 10.0.1
setuptools: 39.1.0
   sklearn: 0.20.0
     numpy: 1.15.3
     scipy: 1.1.0
    Cython: None
    pandas: None

as well as these warnings:

/opt/tljh/user/lib/python3.6/site-packages/numpy/distutils/system_info.py:625: UserWarning: 
    Atlas (http://math-atlas.sourceforge.net/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [atlas]) or by setting
    the ATLAS environment variable.
  self.calc_info()
/opt/tljh/user/lib/python3.6/site-packages/numpy/distutils/system_info.py:625: UserWarning: 
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.
  self.calc_info()
/opt/tljh/user/lib/python3.6/site-packages/numpy/distutils/system_info.py:625: UserWarning: 
    Blas (http://www.netlib.org/blas/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [blas_src]) or by setting
    the BLAS_SRC environment variable.
  self.calc_info()

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (13 by maintainers)

Most upvoted comments

Thanks for the quick response, can confirm it works on scikit-learn master now.

Sorry, this was caused by a bad configuration setting on the server. Fixed now.

Oh god. PHP rendering HTML-wrapped error messages to stdout! 😦 @janvanrijn

The important code is:

from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

I was able to reproduce the error on the current release of scikit-learn but not on the most recent development version in this repo. I know some work was done recently on fetch_openml so I believe the issue has been resolved. However, if someone else could confirm that would be great.