Sklearn OneVsRestClassifier - get probabilities for all possibilities of target class





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







2















I have a pipeline that performs feature engineering and model selection.



Feature engineering and model selection



from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier


Pipeline of feature engineering and model



model = Pipeline([('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))])


Paramater selection



from sklearn.model_selection import GridSearchCV
parameters = {'vectorizer__ngram_range': [(1, 1), (1, 2),(2,2)],
'tfidf__use_idf': (True, False)}

gs_clf_svm = GridSearchCV(model, parameters, n_jobs=-1)
gs_clf_svm = gs_clf_svm.fit(X, y)
print(gs_clf_svm.best_score_)
print(gs_clf_svm.best_params_)


Preparing the final pipeline using the selected parameters



model = Pipeline([('vectorizer', CountVectorizer(ngram_range=(1,2))),
('tfidf', TfidfTransformer(use_idf=True)),
('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))])


Fit model with training data
model.fit(X_train, y_train)



Save the model



from sklearn.externals import joblib
joblib.dump(model, 'model_question_topic.pkl', compress=1)


NOW in another file, I am loading model and predicting



from sklearn.externals import joblib
model = joblib.load('model_question_topic.pkl')


Now it is predicting the classes properly as class 1



question = "apply leave"
model.predict([question])[0]


BUT the problem is I need the confidence rate or percentage like




Class1 = 0.8 -- Class2 = 0.05 -- Class3 = 0.05 -- Class4 = 0.1




model.predict_proba([question])[0]


How do I do this in python3?










share|improve this question

























  • You are aware that Class2 and Class3 have the same probability in your description? If you really, really want it that way, you calculate which of these thresholds is "closest" to your actual probability. If the result is not unique (like with Class2 and Class3), then use random choice.

    – Thomas Lang
    Nov 29 '18 at 5:29






  • 1





    What do you get when you run model.predict_proba([question])[0] ?

    – Clock Slave
    Nov 29 '18 at 6:31













  • model.predict_proba() will do the same. Have you tried it?

    – Vivek Kumar
    Nov 29 '18 at 6:55











  • model.predict_proba([question])[0] gives as class1

    – Chethan Kumar GN
    Nov 29 '18 at 12:22











  • But i need the confidence rate as this Class1 = 0.8 -- Class2 = 0.04 -- Class3 = 0.06 -- Class4 = 0.1 But when i use model.predict_proba() i am getting this error I tried AttributeError: 'LinearSVC' object has no attribute 'predict_proba'

    – Chethan Kumar GN
    Nov 29 '18 at 12:53


















2















I have a pipeline that performs feature engineering and model selection.



Feature engineering and model selection



from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier


Pipeline of feature engineering and model



model = Pipeline([('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))])


Paramater selection



from sklearn.model_selection import GridSearchCV
parameters = {'vectorizer__ngram_range': [(1, 1), (1, 2),(2,2)],
'tfidf__use_idf': (True, False)}

gs_clf_svm = GridSearchCV(model, parameters, n_jobs=-1)
gs_clf_svm = gs_clf_svm.fit(X, y)
print(gs_clf_svm.best_score_)
print(gs_clf_svm.best_params_)


Preparing the final pipeline using the selected parameters



model = Pipeline([('vectorizer', CountVectorizer(ngram_range=(1,2))),
('tfidf', TfidfTransformer(use_idf=True)),
('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))])


Fit model with training data
model.fit(X_train, y_train)



Save the model



from sklearn.externals import joblib
joblib.dump(model, 'model_question_topic.pkl', compress=1)


NOW in another file, I am loading model and predicting



from sklearn.externals import joblib
model = joblib.load('model_question_topic.pkl')


Now it is predicting the classes properly as class 1



question = "apply leave"
model.predict([question])[0]


BUT the problem is I need the confidence rate or percentage like




Class1 = 0.8 -- Class2 = 0.05 -- Class3 = 0.05 -- Class4 = 0.1




model.predict_proba([question])[0]


How do I do this in python3?










share|improve this question

























  • You are aware that Class2 and Class3 have the same probability in your description? If you really, really want it that way, you calculate which of these thresholds is "closest" to your actual probability. If the result is not unique (like with Class2 and Class3), then use random choice.

    – Thomas Lang
    Nov 29 '18 at 5:29






  • 1





    What do you get when you run model.predict_proba([question])[0] ?

    – Clock Slave
    Nov 29 '18 at 6:31













  • model.predict_proba() will do the same. Have you tried it?

    – Vivek Kumar
    Nov 29 '18 at 6:55











  • model.predict_proba([question])[0] gives as class1

    – Chethan Kumar GN
    Nov 29 '18 at 12:22











  • But i need the confidence rate as this Class1 = 0.8 -- Class2 = 0.04 -- Class3 = 0.06 -- Class4 = 0.1 But when i use model.predict_proba() i am getting this error I tried AttributeError: 'LinearSVC' object has no attribute 'predict_proba'

    – Chethan Kumar GN
    Nov 29 '18 at 12:53














2












2








2








I have a pipeline that performs feature engineering and model selection.



Feature engineering and model selection



from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier


Pipeline of feature engineering and model



model = Pipeline([('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))])


Paramater selection



from sklearn.model_selection import GridSearchCV
parameters = {'vectorizer__ngram_range': [(1, 1), (1, 2),(2,2)],
'tfidf__use_idf': (True, False)}

gs_clf_svm = GridSearchCV(model, parameters, n_jobs=-1)
gs_clf_svm = gs_clf_svm.fit(X, y)
print(gs_clf_svm.best_score_)
print(gs_clf_svm.best_params_)


Preparing the final pipeline using the selected parameters



model = Pipeline([('vectorizer', CountVectorizer(ngram_range=(1,2))),
('tfidf', TfidfTransformer(use_idf=True)),
('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))])


Fit model with training data
model.fit(X_train, y_train)



Save the model



from sklearn.externals import joblib
joblib.dump(model, 'model_question_topic.pkl', compress=1)


NOW in another file, I am loading model and predicting



from sklearn.externals import joblib
model = joblib.load('model_question_topic.pkl')


Now it is predicting the classes properly as class 1



question = "apply leave"
model.predict([question])[0]


BUT the problem is I need the confidence rate or percentage like




Class1 = 0.8 -- Class2 = 0.05 -- Class3 = 0.05 -- Class4 = 0.1




model.predict_proba([question])[0]


How do I do this in python3?










share|improve this question
















I have a pipeline that performs feature engineering and model selection.



Feature engineering and model selection



from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier


Pipeline of feature engineering and model



model = Pipeline([('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))])


Paramater selection



from sklearn.model_selection import GridSearchCV
parameters = {'vectorizer__ngram_range': [(1, 1), (1, 2),(2,2)],
'tfidf__use_idf': (True, False)}

gs_clf_svm = GridSearchCV(model, parameters, n_jobs=-1)
gs_clf_svm = gs_clf_svm.fit(X, y)
print(gs_clf_svm.best_score_)
print(gs_clf_svm.best_params_)


Preparing the final pipeline using the selected parameters



model = Pipeline([('vectorizer', CountVectorizer(ngram_range=(1,2))),
('tfidf', TfidfTransformer(use_idf=True)),
('clf', OneVsRestClassifier(LinearSVC(class_weight="balanced")))])


Fit model with training data
model.fit(X_train, y_train)



Save the model



from sklearn.externals import joblib
joblib.dump(model, 'model_question_topic.pkl', compress=1)


NOW in another file, I am loading model and predicting



from sklearn.externals import joblib
model = joblib.load('model_question_topic.pkl')


Now it is predicting the classes properly as class 1



question = "apply leave"
model.predict([question])[0]


BUT the problem is I need the confidence rate or percentage like




Class1 = 0.8 -- Class2 = 0.05 -- Class3 = 0.05 -- Class4 = 0.1




model.predict_proba([question])[0]


How do I do this in python3?







python machine-learning scikit-learn nlp svm






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 29 '18 at 6:27









Clock Slave

2,39062664




2,39062664










asked Nov 29 '18 at 5:21









Chethan Kumar GNChethan Kumar GN

316




316













  • You are aware that Class2 and Class3 have the same probability in your description? If you really, really want it that way, you calculate which of these thresholds is "closest" to your actual probability. If the result is not unique (like with Class2 and Class3), then use random choice.

    – Thomas Lang
    Nov 29 '18 at 5:29






  • 1





    What do you get when you run model.predict_proba([question])[0] ?

    – Clock Slave
    Nov 29 '18 at 6:31













  • model.predict_proba() will do the same. Have you tried it?

    – Vivek Kumar
    Nov 29 '18 at 6:55











  • model.predict_proba([question])[0] gives as class1

    – Chethan Kumar GN
    Nov 29 '18 at 12:22











  • But i need the confidence rate as this Class1 = 0.8 -- Class2 = 0.04 -- Class3 = 0.06 -- Class4 = 0.1 But when i use model.predict_proba() i am getting this error I tried AttributeError: 'LinearSVC' object has no attribute 'predict_proba'

    – Chethan Kumar GN
    Nov 29 '18 at 12:53



















  • You are aware that Class2 and Class3 have the same probability in your description? If you really, really want it that way, you calculate which of these thresholds is "closest" to your actual probability. If the result is not unique (like with Class2 and Class3), then use random choice.

    – Thomas Lang
    Nov 29 '18 at 5:29






  • 1





    What do you get when you run model.predict_proba([question])[0] ?

    – Clock Slave
    Nov 29 '18 at 6:31













  • model.predict_proba() will do the same. Have you tried it?

    – Vivek Kumar
    Nov 29 '18 at 6:55











  • model.predict_proba([question])[0] gives as class1

    – Chethan Kumar GN
    Nov 29 '18 at 12:22











  • But i need the confidence rate as this Class1 = 0.8 -- Class2 = 0.04 -- Class3 = 0.06 -- Class4 = 0.1 But when i use model.predict_proba() i am getting this error I tried AttributeError: 'LinearSVC' object has no attribute 'predict_proba'

    – Chethan Kumar GN
    Nov 29 '18 at 12:53

















You are aware that Class2 and Class3 have the same probability in your description? If you really, really want it that way, you calculate which of these thresholds is "closest" to your actual probability. If the result is not unique (like with Class2 and Class3), then use random choice.

– Thomas Lang
Nov 29 '18 at 5:29





You are aware that Class2 and Class3 have the same probability in your description? If you really, really want it that way, you calculate which of these thresholds is "closest" to your actual probability. If the result is not unique (like with Class2 and Class3), then use random choice.

– Thomas Lang
Nov 29 '18 at 5:29




1




1





What do you get when you run model.predict_proba([question])[0] ?

– Clock Slave
Nov 29 '18 at 6:31







What do you get when you run model.predict_proba([question])[0] ?

– Clock Slave
Nov 29 '18 at 6:31















model.predict_proba() will do the same. Have you tried it?

– Vivek Kumar
Nov 29 '18 at 6:55





model.predict_proba() will do the same. Have you tried it?

– Vivek Kumar
Nov 29 '18 at 6:55













model.predict_proba([question])[0] gives as class1

– Chethan Kumar GN
Nov 29 '18 at 12:22





model.predict_proba([question])[0] gives as class1

– Chethan Kumar GN
Nov 29 '18 at 12:22













But i need the confidence rate as this Class1 = 0.8 -- Class2 = 0.04 -- Class3 = 0.06 -- Class4 = 0.1 But when i use model.predict_proba() i am getting this error I tried AttributeError: 'LinearSVC' object has no attribute 'predict_proba'

– Chethan Kumar GN
Nov 29 '18 at 12:53





But i need the confidence rate as this Class1 = 0.8 -- Class2 = 0.04 -- Class3 = 0.06 -- Class4 = 0.1 But when i use model.predict_proba() i am getting this error I tried AttributeError: 'LinearSVC' object has no attribute 'predict_proba'

– Chethan Kumar GN
Nov 29 '18 at 12:53












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53532346%2fsklearn-onevsrestclassifier-get-probabilities-for-all-possibilities-of-target%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53532346%2fsklearn-onevsrestclassifier-get-probabilities-for-all-possibilities-of-target%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Contact image not getting when fetch all contact list from iPhone by CNContact

count number of partitions of a set with n elements into k subsets

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks