predict ought to have an optional threshold argument · Issue #4813 · scikit-learn/scikit-learn

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

爱听歌的手术刀 · Haystack for Django ...· 6 月前 ·

英俊的木耳 · Mysql必知必会（4）：使用正则表达式搜索 ...· 10 月前 ·

无邪的黑框眼镜 · 群晖NAS用户重命名文件或文件夹需要什么权限 ...· 1 年前 ·

俊秀的遥控器 · Kotlin 枚举类 | 菜鸟教程· 1 年前 ·

严肃的面包 · 科曼谈2020年欧冠决赛进球：我用眼睛打进头 ...· 1 年前 ·

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn classifiers have a predict. Most classifiers have a predict_proba or decision_function . One ought to be able to set a threshold on these to make a prediction.

This is useful for example in implementing asymmetric costs. Currently i can use a recall score to penalize FN in the cross-validator. But having trained the model, i dont want the old predictions, I want a new one which uses an appropriate new threshold on the classification to correspond to this risk asymmetry.

Yeah, I think a meta-estimator is probably better.
How do you pick the threshold? Optimizing a cost-matrix?
I also think that a meta-estimator picking the threshold to optimize f1 would be cool.

I would suggest to allow the threshold parameter to be either a scalar or an array with size equals X_test.shape[0]
This would be highly useful when using the estimated probabilities as part of a model that uses example-dependent thresholds such as the Bayes minimum risk

I recently need this and constructed a pair of meta estimators that together allow you to optimise the threshold as if it was yet another hyper parameter:

pipe = make_pipeline(PredictionTransformer(RandomForestClassifier()),
                     ThresholdClassifier())
pipe_param_grid = {#'predictiontransformer__clf__max_depth': [1, 2, 5, 10, 20, 30, 40, 50],
                   #'predictiontransformer__clf__max_features': [8, 16, 32, 64, 80, 100],
                   'thresholdclassifier__threshold': np.linspace(0, 1, num=100)}
grids = [grid_search(n,
                     clf=pipe,
                     param_grid=pipe_param_grid) for n in range(10)]
scores = [g.best_score_ for g in grids]
print("Average score: %.4f+-%.4f" %(np.mean(scores), sp.stats.sem(scores)))

The code for the two estimators is the following:

class PredictionTransformer(BaseEstimator, TransformerMixin, MetaEstimatorMixin):
    def __init__(self, clf):
        """Replaces all features with `clf.predict_proba(X)`"""
        self.clf = clf
    def fit(self, X, y):
        self.clf.fit(X, y)
        return self
    def transform(self, X):
        return self.clf.predict_proba(X)
class ThresholdClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, threshold=0.5):
        """Classify samples based on whether they are above of below `threshold`"""
        self.threshold = threshold
    def fit(self, X, y):
        self.classes_ = np.unique(y)
        return self
    def predict(self, X):
        # the implementation used here breaks ties differently
        # from the one used in RFs:
        #return self.classes_.take(np.argmax(X, axis=1), axis=0)
        return np.where(X[:, 0]>self.threshold, *self.classes_)

A bit more detail and part of the use case I had is in a blog post on unbiased performance estimates .

I have a nagging voice in my head that says that the PredictionTransformer is illegal because it changes the size of X . Is this a rule that transformers should follow?

I'd be interested on working on this together with you @joshlk if you want to.

I have a nagging voice in my head that says that the PredictionTransformer is illegal because it changes the size of X. Is this a rule that transformers should follow

Dont listen to it! :)

In this case there is nothing wrong with the API because X.shape[0] does not change. Changing X.shape[1] is fine and we have many examples of that in scikit-learn, PCA is one of them.

In some instances I think it be more appropriate to mark the samples whos prediction falls short of the threshold with an unclassified marker instead of removing them completely.

For instance, in the semi-supervised context ( LabelPropagation and LabelSpreading ) unclassified samples have a y value of -1. You may for example want to do thresholding the then pass to a semi-supervised classifier or you want to use an evaluation metric which specifically takes into consideration the number of unclassified samples.

Maybe there should be a switch which allows you to either remove the samples or mark them as unclassified.

@betatim Yes im interested in working on this with you.

Why not incorporate the PredictionTransformer and ThresholdClassifier into one class? so all you would need to do is something like:
clf = ThresholdTransformer(RandomForestClassifier())

@betatim do you mind elaborating on this point more:

In HEP we often tend to pick the threshold by optimising something like N1/ sqrt(N1+N2) (N1 = samples in class 1, N2 = samples in class2), or purity, or N1/sqrt(N2), or ... -> 👍

I'm currently doing a lot of multi-class classification and use the micro-F1 score as my evaluation metric. Do you know how could I optimise a threshold in this context?

Not sure why I didn't make it one class. Might have been because I wanted to have PredictionTransformer as a single step, eg do this for several different classifiers and then make a FeatureUnion .

Not quite sure I understand your question threshold optimisation. If you use GridSearchCV you can pass your own scorer, together with a set of threshold values to try. This should then pick the best one.

@amueller and others any thoughts re: optimisation on exposing the threshold as yet another hyper-parameter? If we have a minimal amount of consensus that this would work (as opposed to say evaluating the scoring metric n_samples times or doing something smarter) I will start a PR and setup things with @joshlk so we can both add patches.

Yes you would have to retrain, this is what I had in mind as a starting point .

You could also imagine a ThresholdClassifierCV that runs its own optimisation loop on the inside. Then you could save yourself having to refit. You'd want this to use some form of CV to find the best threshold. When being used inside GridSearchCV the ThresholdClassifierCV would further split the training dataset given to it by GridSearchCV into folds. Something like this:

class ThresholdClassifierCV(BaseEstimator, ClassifierMixin):
    def __init__(self, scorer, thresholds=np.linspace(0, 1, num=100), cv=3):
        self.scorer = scorer
        self.cv = cv
        self.thresholds = thresholds
    def find_best_threshold(self, X, y):
        for t in self.thresholds:
        .....
    def fit(self, X, y):
        self.classes_ = np.unique(y)
        thresholds = []
        for fold in cv:
            thresholds.append(find_best_threshold(X[fold], y[fold])
        self.threshold_ = np.mean(thresholds)
        return self
    def predict(self, X):
        return np.where(X[:, 0]>self.threshold_, *self.classes_)

Looks like a good idea to me to find the threshold according to some score. Once you found it you could then use ThresholdClassifier you mentioned earlier.

@joshlk I dont know about other areas of HEP but in astroparticle related research many people use the numbers mentioned by Tim. I'm not sure if thats the right way to go when comparing between different models (changing the prediction threshold effectifly gives you a different model right?)
I've also seen many models where these scores do not show a clear maximum. But I also couldn't find a clear concise defintion of these anywhere. (I think astro people often these numbers call q-values)
The actual prediction theshold chosen however still depends on what you want to do with the data the model classifies for you. Sometimes we need more statistics and can live with some false positives and sometimes we can't. In these cases the physicist often chooses the threshold from a gut feeling using had waving arguments. If there was a more formal approach I'd welcome that very much.

Btw the way I picked thresholds usually is by specifying a slice on the roc-curve, either precision or recall, and then define a metric "precision at recall=X" and grid-search that. Internally that can find the right threshold.

I guess that doesn't really provide one consistent threshold to use for the model in the end though.. hum...

@amueller do you have an example of this? I have the same need, but haven't figured out how to do it yet.

Btw the way I picked thresholds usually is by specifying a slice on the roc-curve, either precision or recall, and then define a metric "precision at recall=X" and grid-search that.

@amueller does this look right for "precision at recall=X"?

import numpy as np
from sklearn.metrics import make_scorer, precision_recall_curve
def precision_at_recall(y_true, y_score, constraint):
    precision, recall, thresholds = precision_recall_curve(y_true, y_score)
    return np.max(precision[recall >= constraint])
precision_at_recall_score = make_scorer(precision_at_recall, needs_threshold=True, constraint=0.95)

On 16 April 2017 at 00:18, rrherr ***@***.***> wrote: @amueller < https://github.com/amueller > does this look right for "precision at recall=X"? def precision_at_recall(y_true, y_score, constraint): precision, recall, thresholds = precision_recall_curve(y_true, y_score) return np.max(precision[recall >= constraint]) precision_at_recall_score = make_scorer(precision_at_recall, needs_threshold=True, constraint=0.95) You are receiving this because you commented. Reply to this email directly, view it on GitHub < #4813 (comment) >, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAEz6wmtut5pkljIhrH_aGZaURRod3CDks5rwNG1gaJpZM4E3Z1s >

@amueller does this look right for "precision at recall=X"?

import numpy as np
from sklearn.metrics import make_scorer, precision_recall_curve
def precision_at_recall(y_true, y_score, constraint):
    precision, recall, thresholds = precision_recall_curve(y_true, y_score)
    return np.max(precision[recall >= constraint])
precision_at_recall_score = make_scorer(precision_at_recall, needs_threshold=True, constraint=0.95)

Also very interested in this. Did the code provided work as intended?

[WIP] FEA New meta-estimator to post-tune the decision_function/predict_proba threshold for binary classifiers #16525