GridSearchCV process doesn't complete with n_jobs set to -1 and some parameters of the algorithm
I'm running GridSearch for Gradient Boosting algorithm and for some parameters it just hangs after 10-15 minutes on my laptop. First iterations produce the python processes according to CPU number (8 in my case) with the visible workload and then the processes are gone, no progress reported. I've found the similar issue
GridSearchCV processes hang with n_jobs
however it's closed so I've decided to open new issue.
RangeIndex: 45222 entries, 0 to 45221
Data columns (total 14 columns):
age 45222 non-null int64
workclass 45222 non-null object
education_level 45222 non-null object
education-num 45222 non-null float64
marital-status 45222 non-null object
occupation 45222 non-null object
relationship 45222 non-null object
race 45222 non-null object
sex 45222 non-null object
capital-gain 45222 non-null float64
capital-loss 45222 non-null float64
hours-per-week 45222 non-null float64
native-country 45222 non-null object
income 45222 non-null object
dtypes: float64(4), int64(1), object(9)
memory usage: 4.8+ MB
And then I get 103 total features after one-hot encoding.
My code:
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import fbeta_score, make_scorer,accuracy_score
clf = GradientBoostingClassifier()
parameters = {'loss': ['deviance', 'exponential'],
'warm_start':[True,False],
'max_depth':[4,5,6,7], - with this setup went away for
'n_estimators': [100, 200,300]
scorer = make_scorer(fbeta_score, beta=0.5)
grid_obj = GridSearchCV(clf,param_grid=parameters,scoring=scorer,n_jobs=-1,verbose=10 )
grid_fit = grid_obj.fit(X_train,y_train)
best_clf = grid_fit.best_estimator_
Versions
Darwin-16.7.0-x86_64-i386-64bit
('Python', '2.7.10 (default, Sep 23 2015, 04:34:14) \n[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)]')
('NumPy', '1.13.3')
('SciPy', '1.0.0')
('Scikit-Learn', '0.19.1')
('Scikit-Learn', '0.19.1')
Actual Results
The output of incomplete process after 3 hours of running:
Fitting 3 folds for each of 48 candidates, totalling 144 fits
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=4, score=0.748376 - 1.1min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=4, score=0.744681 - 1.1min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=4, score=0.748376 - 1.1min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=4 ..
[Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 1.1min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=4, score=0.755791 - 1.1min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=4, score=0.744681 - 1.1min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=4, score=0.755791 - 1.1min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=4, score=0.755238 - 2.0min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=4 ...
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=4, score=0.748726 - 2.0min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=4, score=0.755390 - 2.3min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=4 ..
[Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 3.4min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=4, score=0.748651 - 2.3min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=4 ..
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=4, score=0.757694 - 2.3min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=5 ...
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=4, score=0.757694 - 2.3min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=5 ...
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=4, score=0.753342 - 3.1min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=5 ...
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=4, score=0.749265 - 3.2min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=5 ..
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=4, score=0.753419 - 3.2min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=5 ..
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=5, score=0.757060 - 1.8min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=5 ..
[Parallel(n_jobs=-1)]: Done 16 tasks | elapsed: 5.2min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=5, score=0.749191 - 1.8min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=5 ...
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=4, score=0.759966 - 3.4min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=5 ...
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=5, score=0.759275 - 1.9min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=5 ...
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=5, score=0.758360 - 1.9min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=5 ..
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=4, score=0.750019 - 3.2min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=5 ..
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=4, score=0.759813 - 3.3min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=5 ..
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=5, score=0.748366 - 1.8min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=5 ...
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=5, score=0.759275 - 1.8min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=5 ...
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=5, score=0.752455 - 3.0min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=5 ...
[Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 8.3min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=5, score=0.749569 - 3.0min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=5 ..
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=5, score=0.752148 - 3.0min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=5 ..
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=5, score=0.757131 - 3.1min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=5 ..
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=5, score=0.749570 - 2.9min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=6 ...
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=5, score=0.757797 - 2.9min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=6 ...
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=5, score=0.747161 - 3.8min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=6 ...
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=5, score=0.747114 - 3.8min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=6 ..
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=6, score=0.752622 - 1.8min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=6 ..
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=6, score=0.747398 - 1.8min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=6 ..
[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 11.4min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=5, score=0.757843 - 3.6min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=6 ...
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=5, score=0.747010 - 3.5min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=6 ...
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=5, score=0.748530 - 3.3min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=6 ...
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=5, score=0.757692 - 3.3min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=6 ..
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=6, score=0.757374 - 1.8min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=6 ..
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=6, score=0.752622 - 1.8min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=6 ..
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=6, score=0.750770 - 1.8min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=6 ...
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=6, score=0.757374 - 1.9min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=6 ...
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=6, score=0.749492 - 4.6min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=6 ...
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=6, score=0.745467 - 4.7min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=6 ..
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=6, score=0.756200 - 5.1min
[Parallel(n_jobs=-1)]: Done 45 tasks | elapsed: 17.6min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=6 ..
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=6, score=0.752206 - 5.1min
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=6 ..
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=6, score=0.745467 - 5.2min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=7 ...
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=6, score=0.755925 - 5.2min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=7 ...
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=6, score=0.749421 - 7.4min
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=7 ...
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=6, score=0.744302 - 7.6min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=7 ..
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=7, score=0.753965 - 3.7min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=7 ..
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=7, score=0.742774 - 3.7min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=7 ..
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=6, score=0.749497 - 7.0min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=7 ...
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=6, score=0.750551 - 7.2min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=7 ...
[CV] n_estimators=100, loss=deviance, warm_start=True, max_depth=7, score=0.755611 - 3.9min
[CV] n_estimators=200, loss=deviance, warm_start=True, max_depth=7 ...
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=6, score=0.740919 - 7.2min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=7 ..
[Parallel(n_jobs=-1)]: Done 56 tasks | elapsed: 24.8min
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=7, score=0.753425 - 3.9min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=7 ..
[CV] n_estimators=300, loss=deviance, warm_start=False, max_depth=6, score=0.751236 - 7.2min
[CV] n_estimators=200, loss=deviance, warm_start=False, max_depth=7 ..
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=7, score=0.746959 - 4.1min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=7 ...
[CV] n_estimators=100, loss=deviance, warm_start=False, max_depth=7, score=0.754571 - 4.3min
[CV] n_estimators=300, loss=deviance, warm_start=True, max_depth=7 ...
Expected Results
Thanks for the issue report.
This is essentially a duplicate of #10533 (comment) . Not much can be done about it on the scikit-learn side, though there are solutions proposed in faq and linked issues.
Feel free to close this issue if that answers your questions. Thanks.