scikit-learn: [Windows] PermissionError in datasets fetchers when trying to remove the downloaded archive

Description

I have just intalled scikit-learn though pip. I am trying to load a dataset (california_housing) but I keep getting a permission error when scikit-learn tries to delete the file. I can delete the file myself from the file explorer, which suggests that it is python that locks the file.

Steps/Code to Reproduce

import numpy as np from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()

Expected Results

No error should be thrown and data should be loaded correctly

Actual Results

Downloading Cal. housing from https://ndownloader.figshare.com/files/5976036 to C:\Users\lucia\scikit_learn_data Traceback (most recent call last): File “test.py”, line 4, in <module> housing = fetch_california_housing() File “C:\Users\lucia\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\datasets\california_housing.py”, line 109, in fetch_california_housing remove(archive_path) PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: ‘C:\Users\lucia\scikit_learn_data\cal_housing.tgz’

Versions

Windows-10-10.0.15063-SP0 Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] NumPy 1.13.1 SciPy 0.19.1 Scikit-Learn 0.19.0

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 18 (15 by maintainers)

Commits related to this issue

Most upvoted comments

You will have to wait for 0.19.1 to be released. In the mean time as a work-around you can find the file and comment out the line with the remove(...) call.

For example this is the stack-trace I get when trying to use fetch_california_housing on Windows:

Downloading Cal. housing from https://ndownloader.figshare.com/files/5976036 to C:\Users\lesteve\scikit_learn_data
---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-1-7f347e637e46> in <module>()
----> 1 from sklearn import datasets; datasets.fetch_california_housing()

C:\Users\lesteve\Miniconda3\lib\site-packages\sklearn\datasets\california_housing.py in fetch_california_housing(data_home, download_if_missing)
    103             name=archive_path).extractfile(
    104                 'CaliforniaHousing/cal_housing.data')
--> 105         remove(archive_path)
    106
    107         cal_housing = np.loadtxt(fileobj, delimiter=',')

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\lesteve\\scikit_learn_data\\cal_housing.tgz'

c:\msys64\home\lesteve\dev>ipython -c "from sklearn import datasets; datasets.fetch_california_housing()"
Downloading Cal. housing from https://ndownloader.figshare.com/files/5976036 to C:\Users\lesteve\scikit_learn_data
---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-1-7f347e637e46> in <module>()
----> 1 from sklearn import datasets; datasets.fetch_california_housing()

C:\Users\lesteve\Miniconda3\lib\site-packages\sklearn\datasets\california_housing.py in fetch_california_housing(data_home, download_if_missing)
    103             name=archive_path).extractfile(
    104                 'CaliforniaHousing/cal_housing.data')
--> 105         remove(archive_path)
    106
    107         cal_housing = np.loadtxt(fileobj, delimiter=',')

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\lesteve\\scikit_learn_data\\cal_housing.tgz'

This tells me that to work around the problem I have to comment out the line remove_archive(path) (line 105) in C:\Users\lesteve\Miniconda3\lib\site-packages\sklearn\datasets\california_housing.py.

Notes:

  • depending on the fetcher, there may be multiple similar calls to remove that you need to comment out.
  • this is just a temporary work-around until 0.19.1 is released. I would not encourage you to patch manually packages like this in general.