auto-sklearn: Unable to process large data set??
I have a data set which is more than 100k records. When I try to fit into AutoSklearnRegressor, it always thrown an warning. It seems causing that I cannot get a expected output.
However, if number of records is small enough (says less than 20k), it can execute without any warming/ error. May you advise this situation? I am using 0.2 version
Sample code
import autosklearn.regression
import numpy as np
x = np.random.randint(2, size=(250000,100))
y = np.random.randint(2, size=(250000,1))
feature_types = (['numerical'] * 100)
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=120, per_run_time_limit=30
)
automl.fit(x, y, dataset_name='boston', feat_type=feature_types)
The exception is
WARNING] [2017-10-26 11:21:28,580:AutoMLSMBO(1)::boston] Could not find meta-data directory /home/anaconda3/lib/python3.5/site-packages/autosklearn/metalearning/files/r2_regression_dense
/home/anaconda3/lib/python3.5/site-packages/autosklearn/smbo.py:737: RuntimeWarning: invalid value encountered in true_divide
(1. - dataset_minimum))
/home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
return (self.a < x) & (x < self.b)
/home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
return (self.a < x) & (x < self.b)
/home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:1735: RuntimeWarning: invalid value encountered in greater_equal
cond2 = (x >= self.b) & cond0
/home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:876: RuntimeWarning: invalid value encountered in greater_equal
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 17 (8 by maintainers)
Tried the above code, but the warming is thrown after 1 ~ 2 mins.