添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account GridSearchCV process doesn't complete with n_jobs set to -1 and some parameters of the algorithm #10625 GridSearchCV process doesn't complete with n_jobs set to -1 and some parameters of the algorithm #10625 agavazuk opened this issue Feb 12, 2018 · 1 comment

Description

I'm running GridSearch for Gradient Boosting algorithm and for some parameters it just hangs after 10-15 minutes on my laptop. First iterations produce the python processes according to CPU number (8 in my case) with the visible workload and then the processes are gone, no progress reported. I've found the similar issue GridSearchCV processes hang with n_jobs however it's closed so I've decided to open new issue.

Steps/Code to Reproduce

My data setup:
It's Census data from UCI repository
Originally there are 13 features:

RangeIndex: 45222 entries, 0 to 45221
Data columns (total 14 columns):
age                45222 non-null int64
workclass          45222 non-null object
education_level    45222 non-null object
education-num      45222 non-null float64
marital-status     45222 non-null object
occupation         45222 non-null object
relationship       45222 non-null object
race               45222 non-null object
sex                45222 non-null object
capital-gain       45222 non-null float64
capital-loss       45222 non-null float64
hours-per-week     45222 non-null float64
native-country     45222 non-null object
income             45222 non-null object
dtypes: float64(4), int64(1), object(9)
memory usage: 4.8+ MB

And then I get 103 total features after one-hot encoding.

My code:

from sklearn.grid_search import GridSearchCV
from sklearn.metrics import fbeta_score, make_scorer,accuracy_score
clf = GradientBoostingClassifier()
parameters = {'loss': ['deviance', 'exponential'],
             'warm_start':[True,False],
              'max_depth':[4,5,6,7], - with this setup went away for 
              'n_estimators': [100, 200,300]
scorer = make_scorer(fbeta_score, beta=0.5)
grid_obj = GridSearchCV(clf,param_grid=parameters,scoring=scorer,n_jobs=-1,verbose=10 )
grid_fit = grid_obj.fit(X_train,y_train)
best_clf = grid_fit.best_estimator_

Versions

Darwin-16.7.0-x86_64-i386-64bit
('Python', '2.7.10 (default, Sep 23 2015, 04:34:14) \n[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)]')
('NumPy', '1.13.3')
('SciPy', '1.0.0')
('Scikit-Learn', '0.19.1')
('Scikit-Learn', '0.19.1')

Actual Results

The output of incomplete process after 3 hours of running:

Fitting 3 folds for each of 48 candidates, totalling 144 fits
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=4 ...
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=4, score=0.748376 - 1.1min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=4 ...
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=4, score=0.744681 - 1.1min
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=4, score=0.748376 - 1.1min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=4 ..
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:  1.1min
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=4, score=0.755791 - 1.1min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=4 ..
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=4, score=0.744681 - 1.1min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=4 ...
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=4, score=0.755791 - 1.1min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=4 ...
[CV]  n_estimators=200, loss=deviance, warm_start=True, max_depth=4, score=0.755238 - 2.0min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=4 ...
[CV]  n_estimators=200, loss=deviance, warm_start=True, max_depth=4, score=0.748726 - 2.0min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=4 ..
[CV]  n_estimators=200, loss=deviance, warm_start=False, max_depth=4, score=0.755390 - 2.3min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=4 ..
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:  3.4min
[CV]  n_estimators=200, loss=deviance, warm_start=False, max_depth=4, score=0.748651 - 2.3min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=4 ..
[CV]  n_estimators=200, loss=deviance, warm_start=True, max_depth=4, score=0.757694 - 2.3min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=5 ...
[CV]  n_estimators=200, loss=deviance, warm_start=False, max_depth=4, score=0.757694 - 2.3min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=5 ...
[CV]  n_estimators=300, loss=deviance, warm_start=True, max_depth=4, score=0.753342 - 3.1min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=5 ...
[CV]  n_estimators=300, loss=deviance, warm_start=True, max_depth=4, score=0.749265 - 3.2min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=5 ..
[CV]  n_estimators=300, loss=deviance, warm_start=False, max_depth=4, score=0.753419 - 3.2min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=5 ..
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=5, score=0.757060 - 1.8min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=5 ..
[Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:  5.2min
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=5, score=0.749191 - 1.8min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=5 ...
[CV]  n_estimators=300, loss=deviance, warm_start=True, max_depth=4, score=0.759966 - 3.4min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=5 ...
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=5, score=0.759275 - 1.9min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=5 ...
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=5, score=0.758360 - 1.9min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=5 ..
[CV]  n_estimators=300, loss=deviance, warm_start=False, max_depth=4, score=0.750019 - 3.2min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=5 ..
[CV]  n_estimators=300, loss=deviance, warm_start=False, max_depth=4, score=0.759813 - 3.3min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=5 ..
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=5, score=0.748366 - 1.8min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=5 ...
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=5, score=0.759275 - 1.8min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=5 ...
[CV]  n_estimators=200, loss=deviance, warm_start=True, max_depth=5, score=0.752455 - 3.0min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=5 ...
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:  8.3min
[CV]  n_estimators=200, loss=deviance, warm_start=True, max_depth=5, score=0.749569 - 3.0min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=5 ..
[CV]  n_estimators=200, loss=deviance, warm_start=False, max_depth=5, score=0.752148 - 3.0min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=5 ..
[CV]  n_estimators=200, loss=deviance, warm_start=True, max_depth=5, score=0.757131 - 3.1min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=5 ..
[CV]  n_estimators=200, loss=deviance, warm_start=False, max_depth=5, score=0.749570 - 2.9min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=6 ...
[CV]  n_estimators=200, loss=deviance, warm_start=False, max_depth=5, score=0.757797 - 2.9min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=6 ...
[CV]  n_estimators=300, loss=deviance, warm_start=True, max_depth=5, score=0.747161 - 3.8min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=6 ...
[CV]  n_estimators=300, loss=deviance, warm_start=True, max_depth=5, score=0.747114 - 3.8min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=6 ..
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=6, score=0.752622 - 1.8min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=6 ..
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=6, score=0.747398 - 1.8min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=6 ..
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed: 11.4min
[CV]  n_estimators=300, loss=deviance, warm_start=True, max_depth=5, score=0.757843 - 3.6min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=6 ...
[CV]  n_estimators=300, loss=deviance, warm_start=False, max_depth=5, score=0.747010 - 3.5min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=6 ...
[CV]  n_estimators=300, loss=deviance, warm_start=False, max_depth=5, score=0.748530 - 3.3min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=6 ...
[CV]  n_estimators=300, loss=deviance, warm_start=False, max_depth=5, score=0.757692 - 3.3min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=6 ..
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=6, score=0.757374 - 1.8min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=6 ..
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=6, score=0.752622 - 1.8min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=6 ..
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=6, score=0.750770 - 1.8min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=6 ...
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=6, score=0.757374 - 1.9min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=6 ...
[CV]  n_estimators=200, loss=deviance, warm_start=True, max_depth=6, score=0.749492 - 4.6min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=6 ...
[CV]  n_estimators=200, loss=deviance, warm_start=True, max_depth=6, score=0.745467 - 4.7min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=6 ..
[CV]  n_estimators=200, loss=deviance, warm_start=True, max_depth=6, score=0.756200 - 5.1min
[Parallel(n_jobs=-1)]: Done  45 tasks      | elapsed: 17.6min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=6 ..
[CV]  n_estimators=200, loss=deviance, warm_start=False, max_depth=6, score=0.752206 - 5.1min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=6 ..
[CV]  n_estimators=200, loss=deviance, warm_start=False, max_depth=6, score=0.745467 - 5.2min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=7 ...
[CV]  n_estimators=200, loss=deviance, warm_start=False, max_depth=6, score=0.755925 - 5.2min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=7 ...
[CV]  n_estimators=300, loss=deviance, warm_start=True, max_depth=6, score=0.749421 - 7.4min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=7 ...
[CV]  n_estimators=300, loss=deviance, warm_start=True, max_depth=6, score=0.744302 - 7.6min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=7 ..
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=7, score=0.753965 - 3.7min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=7 ..
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=7, score=0.742774 - 3.7min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=7 ..
[CV]  n_estimators=300, loss=deviance, warm_start=False, max_depth=6, score=0.749497 - 7.0min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=7 ...
[CV]  n_estimators=300, loss=deviance, warm_start=True, max_depth=6, score=0.750551 - 7.2min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=7 ...
[CV]  n_estimators=100, loss=deviance, warm_start=True, max_depth=7, score=0.755611 - 3.9min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=7 ...
[CV]  n_estimators=300, loss=deviance, warm_start=False, max_depth=6, score=0.740919 - 7.2min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=7 ..
[Parallel(n_jobs=-1)]: Done  56 tasks      | elapsed: 24.8min
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=7, score=0.753425 - 3.9min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=7 ..
[CV]  n_estimators=300, loss=deviance, warm_start=False, max_depth=6, score=0.751236 - 7.2min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=7 ..
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=7, score=0.746959 - 4.1min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=7 ...
[CV]  n_estimators=100, loss=deviance, warm_start=False, max_depth=7, score=0.754571 - 4.3min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=7 ...

Expected Results

Thanks for the issue report.

This is essentially a duplicate of #10533 (comment) . Not much can be done about it on the scikit-learn side, though there are solutions proposed in faq and linked issues.

Feel free to close this issue if that answers your questions. Thanks.