auto-sklearn: Classifier processes eat up all memory and freeze
The autosklearn is run in a multiprocessing mode using the Python Pool. I have a smaller dataset that doesn’t take much memory mostly (about 1GB per process). But some processes manage to get to as high as 58GB and sit there idle forever. After four of them run the box seems to be out of memory so other processes seems to be blocked as well.
14177 ekobylki 20 0 58.9g 58g 1648 S 0.0 24.6 37:56.59 python
14183 ekobylki 20 0 57.0g 56g 1528 S 0.0 23.8 37:24.30 python
14191 ekobylki 20 0 56.9g 55g 12 S 0.0 23.6 38:27.54 python
14190 ekobylki 20 0 56.9g 54g 12 S 0.0 23.2 39:12.09 python
28971 ekobylki 20 0 931m 29m 816 S 0.0 0.0 0:00.09 python
26886 ekobylki 20 0 931m 28m 4 S 0.0 0.0 0:00.03 python
26785 ekobylki 20 0 931m 28m 4 S 0.0 0.0 0:00.03 python
26743 ekobylki 20 0 931m 28m 4 S 0.0 0.0 0:00.00 python
these are the only errors reported in the run-err*.txt so these above may or may not be the libsvm_svc models that eat up this memory.
21:04:20.572 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The following algorithm call failed: cd “/x/truffles/./nfs_share/atsklrn_tmp” ; runsolver --watcher-data /dev/null -W 9865 -d 30 -M 5000 python /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/SMAC_interface.py holdout ./nfs_share/atsklrn_tmp/.auto-sklearn/datamanager.pkl 9865.0 2147483647 -1 -balancing:strategy ‘none’ -classifier:choice ‘libsvm_svc’ -classifier:libsvm_svc:C ‘967.1406585393779’ -classifier:libsvm_svc:coef0 ‘-0.8043440523197929’ -classifier:libsvm_svc:degree ‘4’ -classifier:libsvm_svc:gamma ‘0.9629176596789994’ -classifier:libsvm_svc:kernel ‘poly’ -classifier:libsvm_svc:max_iter ‘-1’ -classifier:libsvm_svc:shrinking ‘False’ -classifier:libsvm_svc:tol ‘3.780913961396449E-5’ -imputation:strategy ‘median’ -one_hot_encoding:minimum_fraction ‘6.536726975871556E-4’ -one_hot_encoding:use_minimum_fraction ‘True’ -preprocessor:choice ‘no_preprocessing’ -rescaling:choice ‘standardize’ 21:11:58.611 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The following algorithm call failed: cd “/x/truffles/./nfs_share/atsklrn_tmp” ; runsolver --watcher-data /dev/null -W 9865 -d 30 -M 5000 python /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/SMAC_interface.py holdout ./nfs_share/atsklrn_tmp/.auto-sklearn/datamanager.pkl 9865.0 2147483647 -1 -balancing:strategy ‘weighting’ -classifier:choice ‘libsvm_svc’ -classifier:libsvm_svc:C ‘980.1693781609448’ -classifier:libsvm_svc:coef0 ‘0.9655846663184429’ -classifier:libsvm_svc:degree ‘5’ -classifier:libsvm_svc:gamma ‘1.1160123630458856’ -classifier:libsvm_svc:kernel ‘poly’ -classifier:libsvm_svc:max_iter ‘-1’ -classifier:libsvm_svc:shrinking ‘False’ -classifier:libsvm_svc:tol ‘0.06035882680156773’ -imputation:strategy ‘most_frequent’ -one_hot_encoding:use_minimum_fraction ‘False’ -preprocessor:choice ‘no_preprocessing’ -rescaling:choice ‘standardize’ 20:51:51.392 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The following algorithm call failed: cd “/x/truffles/./nfs_share/atsklrn_tmp” ; runsolver --watcher-data /dev/null -W 9865 -d 30 -M 5000 python /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/SMAC_interface.py holdout ./nfs_share/atsklrn_tmp/.auto-sklearn/datamanager.pkl 9865.0 2147483647 -1 -balancing:strategy ‘weighting’ -classifier:choice ‘libsvm_svc’ -classifier:libsvm_svc:C ‘1928.806880985533’ -classifier:libsvm_svc:coef0 ‘0.4875440101240127’ -classifier:libsvm_svc:degree ‘5’ -classifier:libsvm_svc:gamma ‘0.2262949152205443’ -classifier:libsvm_svc:kernel ‘poly’ -classifier:libsvm_svc:max_iter ‘-1’ -classifier:libsvm_svc:shrinking ‘False’ -classifier:libsvm_svc:tol ‘1.994581419473514E-5’ -imputation:strategy ‘most_frequent’ -one_hot_encoding:minimum_fraction ‘0.002428618650930115’ -one_hot_encoding:use_minimum_fraction ‘True’ -preprocessor:choice ‘no_preprocessing’ -rescaling:choice ‘standardize’ 16:45:32.235 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The following algorithm call failed: cd “/x/truffles/./nfs_share/atsklrn_tmp” ; runsolver --watcher-data /dev/null -W 9865 -d 30 -M 5000 python /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/SMAC_interface.py holdout ./nfs_share/atsklrn_tmp/.auto-sklearn/datamanager.pkl 9865.0 2147483647 -1 -balancing:strategy ‘weighting’ -classifier:choice ‘multinomial_nb’ -classifier:multinomial_nb:alpha ‘0.26529447713685506’ -classifier:multinomial_nb:fit_prior ‘True’ -imputation:strategy ‘most_frequent’ -one_hot_encoding:minimum_fraction ‘0.010693674573559887’ -one_hot_encoding:use_minimum_fraction ‘True’ -preprocessor:choice ‘no_preprocessing’ -rescaling:choice ‘min/max’ 20:37:52.185 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The following algorithm call failed: cd “/x/truffles/./nfs_share/atsklrn_tmp” ; runsolver --watcher-data /dev/null -W 9865 -d 30 -M 5000 python /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/SMAC_interface.py holdout ./nfs_share/atsklrn_tmp/.auto-sklearn/datamanager.pkl 9865.0 2147483647 -1 -balancing:strategy ‘none’ -classifier:choice ‘libsvm_svc’ -classifier:libsvm_svc:C ‘1113.6255293600818’ -classifier:libsvm_svc:coef0 ‘-0.9090303757992946’ -classifier:libsvm_svc:degree ‘4’ -classifier:libsvm_svc:gamma ‘0.5701674840721168’ -classifier:libsvm_svc:kernel ‘poly’ -classifier:libsvm_svc:max_iter ‘-1’ -classifier:libsvm_svc:shrinking ‘True’ -classifier:libsvm_svc:tol ‘1.65010853496975E-5’ -imputation:strategy ‘most_frequent’ -one_hot_encoding:minimum_fraction ‘0.021107927190034653’ -one_hot_encoding:use_minimum_fraction ‘True’ -preprocessor:choice ‘no_preprocessing’ -rescaling:choice ‘standardize’ 00:01:14.702 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The following algorithm call failed: cd “/x/truffles/./nfs_share/atsklrn_tmp” ; runsolver --watcher-data /dev/null -W 9865 -d 30 -M 5000 python /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/SMAC_interface.py holdout ./nfs_share/atsklrn_tmp/.auto-sklearn/datamanager.pkl 9865.0 2147483647 -1 -balancing:strategy ‘none’ -classifier:choice ‘libsvm_svc’ -classifier:libsvm_svc:C ‘175.35300387835784’ -classifier:libsvm_svc:coef0 ‘-0.44666123056615636’ -classifier:libsvm_svc:degree ‘5’ -classifier:libsvm_svc:gamma ‘4.213441046368573’ -classifier:libsvm_svc:kernel ‘poly’ -classifier:libsvm_svc:max_iter ‘-1’ -classifier:libsvm_svc:shrinking ‘False’ -classifier:libsvm_svc:tol ‘1.540015151031467E-4’ -imputation:strategy ‘most_frequent’ -one_hot_encoding:minimum_fraction ‘0.0021976812568160094’ -one_hot_encoding:use_minimum_fraction ‘True’ -preprocessor:choice ‘no_preprocessing’ -rescaling:choice ‘standardize’ 21:14:14.883 [CLI TAE (Master Thread - #0)] ERROR c.u.c.b.a.t.b.c.CommandLineAlgorithmRun - The following algorithm call failed: cd “/x/truffles/./nfs_share/atsklrn_tmp” ; runsolver --watcher-data /dev/null -W 9865 -d 30 -M 5000 python /home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/cli/SMAC_interface.py holdout ./nfs_share/atsklrn_tmp/.auto-sklearn/datamanager.pkl 9865.0 2147483647 -1 -balancing:strategy ‘none’ -classifier:choice ‘libsvm_svc’ -classifier:libsvm_svc:C ‘16.117183011803604’ -classifier:libsvm_svc:coef0 ‘-0.2687983462436514’ -classifier:libsvm_svc:degree ‘5’ -classifier:libsvm_svc:gamma ‘1.322366658499254’ -classifier:libsvm_svc:kernel ‘poly’ -classifier:libsvm_svc:max_iter ‘-1’ -classifier:libsvm_svc:shrinking ‘False’ -classifier:libsvm_svc:tol ‘0.05943499267707366’ -imputation:strategy ‘mean’ -one_hot_encoding:minimum_fraction ‘0.0036462574639183’ -one_hot_encoding:use_minimum_fraction ‘True’ -preprocessor:choice ‘no_preprocessing’ -rescaling:choice 'standardize
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 16 (14 by maintainers)
Yes, that would probably mitigate some issues. I also added an example for parallel processing in the examples directory of the development branch since I had to change the interface a little bit.