auto-sklearn: Error running "fit" with many cores.

Hi! I’m experiencing a problem when I fit an AutoSklearn instance in a virtual machine with many cores.

I have run exactly the same code, with the same dataset in three different virtual machines:

in a vm with 4 cores and 15Gb of RAM: works ok ✅ in a vm with 8 cores and 30Gb of RAM: works ok ✅ in a vm with 40 cores and 157 Gb of RAM: fails ❌ with the following error:

ValueError: Dummy prediction failed with run state StatusType.CRASHED and additional output: {'error': 'Result queue is empty', 'exit_status': "<class 'pynisher.limit_function_call.AnythingException'>", 'subprocess_stdout': '', 'subprocess_stderr': 'Process pynisher function call:\nTraceback (most recent call last):\n File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap\n self.run()\n File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run\n self._target(*self._args, **self._kwargs)\n File "/usr/local/lib/python3.7/site-packages/pynisher/limit_function_call.py", line 133, in subprocess_func\n return_value = ((func(*args, **kwargs), 0))\n File "/usr/local/lib/python3.7/site-packages/autosklearn/evaluation/__init__.py", line 40, in fit_predict_try_except_decorator\n return ta(queue=queue, **kwargs)\n File "/usr/local/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 1164, in eval_holdout\n budget_type=budget_type,\n File "/usr/local/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 194, in __init__\n budget_type=budget_type,\n File "/usr/local/lib/python3.7/site-packages/autosklearn/evaluation/abstract_evaluator.py", line 199, in __init__\n threadpool_limits(limits=1)\n File "/usr/local/lib/python3.7/site-packages/threadpoolctl.py", line 171, in __init__\n self._original_info = self._set_threadpool_limits()\n File "/usr/local/lib/python3.7/site-packages/threadpoolctl.py", line 280, in _set_threadpool_limits\n module.set_num_threads(num_threads)\n File "/usr/local/lib/python3.7/site-packages/threadpoolctl.py", line 659, in set_num_threads\n return set_func(num_threads)\nKeyboardInterrupt\n', 'exitcode': 1, 'configuration_origin': 'DUMMY'}.

This is the code I was running:

automl = AutoSklearnClassifier(time_left_for_this_task=600, metric=roc_auc)
automl.fit(x_train, y_train, x_validation, y_validation)

Limiting the number of cores with the param nproc seems to work, but it’s a pity that we cannot take advantage of larger infra 😦

The dataset doesn’t seem to be the problem. I reproduced the bug with datasets of different sizes and different feature types, and everytime it raises the same error (it’s not something that happens stochastically).

Also, the error is almost instantaneous: clearly it doesn’t even start to fit when it fails.

Environment and installation:

OS: linux
Python version: 3.7
Auto-sklearn version: 0.13.0

About this issue

Original URL
State: open
Created 3 years ago
Reactions: 13
Comments: 17 (4 by maintainers)

Most upvoted comments

The workaround I found to fix this issue is to limit the number of cores with the env var OPENBLAS_NUM_THREADS before importing anything from autosklearn.

For example:

import os

os.environ["OPENBLAS_NUM_THREADS"] = "8"

from autosklearn(...)

sofidenner on Sep 27, 2021

I have been getting this error as well on macOS Monterey 12.0 and auto-sklearn==0.13.0, and I have not updated any libraries in my environment before this error started showing up. It happens when calling fit regardless of parameters:

File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/c91195a/Documents/experian/dragon/dragon/console.py", line 504, in train_console
    train(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/Users/c91195a/Documents/experian/dragon/dragon/train.py", line 436, in train
    experiment.run()
  File "/Users/c91195a/Documents/experian/dragon/dragon/experiment/experiment.py", line 180, in run
    self.__fit()
  File "/Users/c91195a/Documents/experian/dragon/dragon/experiment/experiment.py", line 52, in __fit
    self.ml_estimator.fit(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/experimental/askl2.py", line 425, in fit
    return super().fit(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/estimators.py", line 941, in fit
    super().fit(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/estimators.py", line 340, in fit
    self.automl_.fit(load_models=self.load_models, **kwargs)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/automl.py", line 1655, in fit
    return super().fit(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/automl.py", line 642, in fit
    self.num_run += self._do_dummy_prediction(datamanager, num_run=1)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/automl.py", line 422, in _do_dummy_prediction
    raise ValueError(
ValueError: Dummy prediction failed with run state StatusType.CRASHED and additional output: {'error': 'Result queue is empty', 'exit_status': "<class 'pynisher.limit_function_call.AnythingException'>", 'subprocess_stdout': '', 'subprocess_stderr': 'Process pynisher function call:\nTraceback (most recent call last):\n  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap\n    self.run()\n  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 108, in run\n    self._target(*self._args, **self._kwargs)\n  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/pynisher/limit_function_call.py", line 108, in subprocess_func\n    resource.setrlimit(resource.RLIMIT_AS, (mem_in_b, mem_in_b))\nValueError: current limit exceeds maximum limit\n', 'exitcode': 1, 'configuration_origin': 'DUMMY'}.
  In call to configurable 'train' (<function train at 0x7f9cbcc5f8b0>)
`
``

raphaelTrench on Jan 6, 2022

Hey @erinaldi,

We recently started reworking pynisher which is in charge of limiting resources for spawned processes. This error line is directly from pynisher and is in the comment you linked: resource.setrlimit(resource.RLIMIT_AS, (mem_in_b, mem_in_b))\nValueError: current limit exceeds maximum limit.

We have another push on getting it to work tomorrow hopefully but we still need a solution for Windows before we can make a release on that.

If you’d like more context or have any solutions, we can use the builtin python module resources for limiting memory on Unix based systems but there is no windows equivalent, it’s a unix only module. We need to find a substitute and then set up some local testing for it (we have no windows machines). There’s also other discrepancies between the three core operating systems.

The error above seems to happen regardless of the memory you provide for RLIMIT_XXX and we think that RLIMIT_AS only works for Linux, or at least doesn’t work on newer MAC OS systems.

If we can’t get a windows version working soon, we will push the Mac fixed version as soon as we can and hopefully it will solve the issue for you 😃

Best, Eddie

eddiebergman on Apr 18, 2022

Hi @sofidenner,

We don’t have infrastructure (a machine with that many cores) to actually test this properly which makes this difficult but we just want to write here to say we are aware of the issue and sorry that we have no response as of yet.

eddiebergman on Nov 17, 2021

I get the same error https://github.com/automl/auto-sklearn/issues/360#issuecomment-963293965 On macOS Monterey with M1 Pro The installation was successful and importing the package works. It seems to be related to this. I understand auto-sklearn is not tested on macOS but I thought about reporting this known issue anyway in case someone finds a solution (which does not require downgrading the OS)

erinaldi on Apr 18, 2022

Hey @mfeurer, I understand. I tried with a lower memory configuration and it still didnt go away. However, in the case anyone faces this issue like me: the error stopped happening when I downgraded my MacOS version to below 12.0 (Monterey).

raphaelTrench on Feb 4, 2022