auto-sklearn: _pickle.UnpicklingError: pickle data was truncated

Describe the bug

I was running autosklearn for a hour on the dataframe kin8nm with id: 189 downloaded from OpenML and then autosklearn stops with this exception _pickle.UnpicklingError: pickle data was truncated

To Reproduce

Steps to reproduce the behavior:

Download the dataframe 189 with the OpenML API
Apply train_test_split
Create the automl instance:

automl = autosklearn.regression.AutoSklearnRegressor(
          time_left_for_this_task=timelife*60,
          per_run_time_limit=30,
          memory_limit=psutil.virtual_memory().available,
          n_jobs=-1,
          resampling_strategy_arguments = {'cv': 10}
)

Timelife in this case i-s equal to 60

Run the fit: automl.fit(X_train, y_train)

Actual behavior, stacktrace or logfile

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 280, in main
    code = _serve_one(child_r, fds,
  File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 319, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
  File "/usr/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 280, in main
    code = _serve_one(child_r, fds,
  File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 319, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
  File "/usr/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated

Environment and installation:

Please give details about your installation:

OS: Ubuntu 20.04.2 LTS
Is your installation in a virtual environment or conda environment? Virtual environment
Python version: 3.8.10 64-bit
Auto-sklearn version: 0.12.6

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (8 by maintainers)

Most upvoted comments

Unfortunately I’m not aware on a way to compute the optimal amount of RAM per core for a specific dataset. As long as datasets are small it doesn’t really matter. However, as you realized, when they get larger it has an impact and figuring out how much to use automatically would be great, but is so far beyond the scope of Auto-sklearn.

mfeurer on Aug 24, 2021