You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
By clicking “Sign up for GitHub”, you agree to our
terms of service
and
privacy statement
. We’ll occasionally send you account related emails.
Already on GitHub?
Sign in
to your account
sklearn classifiers have a predict. Most classifiers have a
predict_proba
or
decision_function
. One ought to be able to set a threshold on these to make a prediction.
This is useful for example in implementing asymmetric costs. Currently i can use a recall score to penalize FN in the cross-validator. But having trained the model, i dont want the old predictions, I want a new one which uses an appropriate new threshold on the classification to correspond to this risk asymmetry.
Yeah, I think a meta-estimator is probably better.
How do you pick the threshold? Optimizing a cost-matrix?
I also think that a meta-estimator picking the threshold to optimize f1 would be cool.
I would suggest to allow the
threshold
parameter to be either a scalar or an array with size equals
X_test.shape[0]
This would be highly useful when using the estimated probabilities as part of a model that uses example-dependent thresholds such as the Bayes minimum risk
I recently need this and constructed a pair of meta estimators that together allow you to optimise the threshold as if it was yet another hyper parameter:
pipe = make_pipeline(PredictionTransformer(RandomForestClassifier()),
ThresholdClassifier())
pipe_param_grid = {#'predictiontransformer__clf__max_depth': [1, 2, 5, 10, 20, 30, 40, 50],
#'predictiontransformer__clf__max_features': [8, 16, 32, 64, 80, 100],
'thresholdclassifier__threshold': np.linspace(0, 1, num=100)}
grids = [grid_search(n,
clf=pipe,
param_grid=pipe_param_grid) for n in range(10)]
scores = [g.best_score_ for g in grids]
print("Average score: %.4f+-%.4f" %(np.mean(scores), sp.stats.sem(scores)))
The code for the two estimators is the following:
class PredictionTransformer(BaseEstimator, TransformerMixin, MetaEstimatorMixin):
def __init__(self, clf):
"""Replaces all features with `clf.predict_proba(X)`"""
self.clf = clf
def fit(self, X, y):
self.clf.fit(X, y)
return self
def transform(self, X):
return self.clf.predict_proba(X)
class ThresholdClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, threshold=0.5):
"""Classify samples based on whether they are above of below `threshold`"""
self.threshold = threshold
def fit(self, X, y):
self.classes_ = np.unique(y)
return self
def predict(self, X):
# the implementation used here breaks ties differently
# from the one used in RFs:
#return self.classes_.take(np.argmax(X, axis=1), axis=0)
return np.where(X[:, 0]>self.threshold, *self.classes_)
A bit more detail and part of the use case I had is in a blog post on
unbiased performance estimates
.
I have a nagging voice in my head that says that the
PredictionTransformer
is illegal because it changes the size of
X
. Is this a rule that transformers should follow?
I'd be interested on working on this together with you
@joshlk
if you want to.
I have a nagging voice in my head that says that the PredictionTransformer is illegal because it changes the size of X. Is this a rule that transformers should follow
Dont listen to it! :)
In this case there is nothing wrong with the API because
X.shape[0]
does not change. Changing
X.shape[1]
is fine and we have many examples of that in scikit-learn,
PCA
is one of them.
In some instances I think it be more appropriate to mark the samples whos prediction falls short of the threshold with an unclassified marker instead of removing them completely.
For instance, in the
semi-supervised context
(
LabelPropagation
and
LabelSpreading
) unclassified samples have a
y
value of -1. You may for example want to do thresholding the then pass to a semi-supervised classifier or you want to use an evaluation metric which specifically takes into consideration the number of unclassified samples.
Maybe there should be a switch which allows you to either remove the samples or mark them as unclassified.
@betatim
Yes im interested in working on this with you.
Why not incorporate the
PredictionTransformer
and
ThresholdClassifier
into one class? so all you would need to do is something like:
clf = ThresholdTransformer(RandomForestClassifier())
@betatim
do you mind elaborating on this point more:
In HEP we often tend to pick the threshold by optimising something like N1/ sqrt(N1+N2) (N1 = samples in class 1, N2 = samples in class2), or purity, or N1/sqrt(N2), or ... -> 👍
I'm currently doing a lot of multi-class classification and use the micro-F1 score as my evaluation metric. Do you know how could I optimise a threshold in this context?
Not sure why I didn't make it one class. Might have been because I wanted to have
PredictionTransformer
as a single step, eg do this for several different classifiers and then make a
FeatureUnion
.
Not quite sure I understand your question threshold optimisation. If you use
GridSearchCV
you can pass your own scorer, together with a set of threshold values to try. This should then pick the best one.
@amueller
and others any thoughts re: optimisation on exposing the threshold as yet another hyper-parameter? If we have a minimal amount of consensus that this would work (as opposed to say evaluating the scoring metric
n_samples
times or doing something smarter) I will start a PR and setup things with
@joshlk
so we can both add patches.
Yes you would have to retrain, this is what I had in mind as a
starting point
.
You could also imagine a
ThresholdClassifierCV
that runs its own optimisation loop on the inside. Then you could save yourself having to refit. You'd want this to use some form of CV to find the best threshold. When being used inside
GridSearchCV
the
ThresholdClassifierCV
would further split the training dataset given to it by
GridSearchCV
into folds. Something like this:
class ThresholdClassifierCV(BaseEstimator, ClassifierMixin):
def __init__(self, scorer, thresholds=np.linspace(0, 1, num=100), cv=3):
self.scorer = scorer
self.cv = cv
self.thresholds = thresholds
def find_best_threshold(self, X, y):
for t in self.thresholds:
.....
def fit(self, X, y):
self.classes_ = np.unique(y)
thresholds = []
for fold in cv:
thresholds.append(find_best_threshold(X[fold], y[fold])
self.threshold_ = np.mean(thresholds)
return self
def predict(self, X):
return np.where(X[:, 0]>self.threshold_, *self.classes_)
Looks like a good idea to me to find the threshold according to some score. Once you found it you could then use ThresholdClassifier you mentioned earlier.
@joshlk
I dont know about other areas of HEP but in astroparticle related research many people use the numbers mentioned by Tim. I'm not sure if thats the right way to go when comparing between different models (changing the prediction threshold effectifly gives you a different model right?)
I've also seen many models where these scores do not show a clear maximum. But I also couldn't find a clear concise defintion of these anywhere. (I think astro people often these numbers call q-values)
The actual prediction theshold chosen however still depends on what you want to do with the data the model classifies for you. Sometimes we need more statistics and can live with some false positives and sometimes we can't. In these cases the physicist often chooses the threshold from a gut feeling using had waving arguments. If there was a more formal approach I'd welcome that very much.
Btw the way I picked thresholds usually is by specifying a slice on the roc-curve, either precision or recall, and then define a metric "precision at recall=X" and grid-search that. Internally that can find the right threshold.
I guess that doesn't really provide one consistent threshold to use for the model in the end though.. hum...
@amueller
do you have an example of this? I have the same need, but haven't figured out how to do it yet.
Btw the way I picked thresholds usually is by specifying a slice on the roc-curve, either precision or recall, and then define a metric "precision at recall=X" and grid-search that.
@amueller
does this look right for "precision at recall=X"?
import numpy as np
from sklearn.metrics import make_scorer, precision_recall_curve
def precision_at_recall(y_true, y_score, constraint):
precision, recall, thresholds = precision_recall_curve(y_true, y_score)
return np.max(precision[recall >= constraint])
precision_at_recall_score = make_scorer(precision_at_recall, needs_threshold=True, constraint=0.95)
On 16 April 2017 at 00:18, rrherr ***@***.***> wrote:
@amueller
<
https://github.com/amueller
> does this look right for
"precision at recall=X"?
def precision_at_recall(y_true, y_score, constraint):
precision, recall, thresholds = precision_recall_curve(y_true, y_score)
return np.max(precision[recall >= constraint])
precision_at_recall_score = make_scorer(precision_at_recall, needs_threshold=True, constraint=0.95)
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<
#4813 (comment)
>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AAEz6wmtut5pkljIhrH_aGZaURRod3CDks5rwNG1gaJpZM4E3Z1s
>
@amueller
does this look right for "precision at recall=X"?
import numpy as np
from sklearn.metrics import make_scorer, precision_recall_curve
def precision_at_recall(y_true, y_score, constraint):
precision, recall, thresholds = precision_recall_curve(y_true, y_score)
return np.max(precision[recall >= constraint])
precision_at_recall_score = make_scorer(precision_at_recall, needs_threshold=True, constraint=0.95)
Also very interested in this. Did the code provided work as intended?
[WIP] FEA New meta-estimator to post-tune the decision_function/predict_proba threshold for binary classifiers
#16525