auto-sklearn: _pickle.UnpicklingError: pickle data was truncated

Describe the bug

I was running autosklearn for a hour on the dataframe kin8nm with id: 189 downloaded from OpenML and then autosklearn stops with this exception _pickle.UnpicklingError: pickle data was truncated

To Reproduce

Steps to reproduce the behavior:

  1. Download the dataframe 189 with the OpenML API
  2. Apply train_test_split
  3. Create the automl instance:
automl = autosklearn.regression.AutoSklearnRegressor(
          time_left_for_this_task=timelife*60,
          per_run_time_limit=30,
          memory_limit=psutil.virtual_memory().available,
          n_jobs=-1,
          resampling_strategy_arguments = {'cv': 10}
)

Timelife in this case i-s equal to 60

  1. Run the fit: automl.fit(X_train, y_train)

Actual behavior, stacktrace or logfile

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 280, in main
    code = _serve_one(child_r, fds,
  File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 319, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
  File "/usr/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 280, in main
    code = _serve_one(child_r, fds,
  File "/usr/lib/python3.8/multiprocessing/forkserver.py", line 319, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
  File "/usr/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated

Environment and installation:

Please give details about your installation:

  • OS: Ubuntu 20.04.2 LTS
  • Is your installation in a virtual environment or conda environment? Virtual environment
  • Python version: 3.8.10 64-bit
  • Auto-sklearn version: 0.12.6

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

Unfortunately I’m not aware on a way to compute the optimal amount of RAM per core for a specific dataset. As long as datasets are small it doesn’t really matter. However, as you realized, when they get larger it has an impact and figuring out how much to use automatically would be great, but is so far beyond the scope of Auto-sklearn.