Significantly lower accuracy while using class_weight for imbalanced dataset in keras

I have rather unintuitive problem. I am doing the sentiment analysis on Amazon Book reviews and the data set is heavily imbalanced. Positive reviews are almost 10 times the negative reviews, accuracy for both training and testing are around 90% (with imbalanced dataset). However, when I try to balance the dataset with the help of class_weight = {0:10 , 1:1} both training and testing accuracy drops to around 65%. Again if I do class_weight = {0:1 , 1:10} accuracy booms again, so apparently I am setting the class_weight wrong but as I understood because the number of positive reviews(1) are 10 times the number of negative reviews(0), shouldn't the class_weight be set as {0:10 , 1:1} ?

This is how I classify training and testing data:

x_train, x_test, y_train, y_test = train_test_split(sequences,labels, test_size = 0.33, random_state = 42)

This is my model:

model = Sequential()

model.add(Embedding(max_words, embedding_dim))

model.add(Dropout(0.5))

model.add(LSTM(32, dropout=0.5, recurrent_dropout=0.5))

model.add(Dense(2, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy',metrics=['acc'])

model.fit(x_train, y_train, epochs=10, batch_size=320, class_weight = {0:1 , 1:10})

asked Nov 25 '18 at 11:47

BlueMango

256

Note that one of the things the model learns is 'how probable' a sample is to be 0 or 1, even without seeing the sample. When you use class weights like that you actually change the model.

– Dinari
Nov 25 '18 at 11:57

add a comment |

This is how I classify training and testing data:

x_train, x_test, y_train, y_test = train_test_split(sequences,labels, test_size = 0.33, random_state = 42)

This is my model:

model = Sequential()

model.add(Embedding(max_words, embedding_dim))

model.add(Dropout(0.5))

model.add(LSTM(32, dropout=0.5, recurrent_dropout=0.5))

model.add(Dense(2, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy',metrics=['acc'])

model.fit(x_train, y_train, epochs=10, batch_size=320, class_weight = {0:1 , 1:10})

asked Nov 25 '18 at 11:47

BlueMango

256

Note that one of the things the model learns is 'how probable' a sample is to be 0 or 1, even without seeing the sample. When you use class weights like that you actually change the model.

– Dinari
Nov 25 '18 at 11:57

add a comment |

This is how I classify training and testing data:

x_train, x_test, y_train, y_test = train_test_split(sequences,labels, test_size = 0.33, random_state = 42)

This is my model:

model = Sequential()

model.add(Embedding(max_words, embedding_dim))

model.add(Dropout(0.5))

model.add(LSTM(32, dropout=0.5, recurrent_dropout=0.5))

model.add(Dense(2, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy',metrics=['acc'])

model.fit(x_train, y_train, epochs=10, batch_size=320, class_weight = {0:1 , 1:10})

asked Nov 25 '18 at 11:47

BlueMango

256

This is how I classify training and testing data:

x_train, x_test, y_train, y_test = train_test_split(sequences,labels, test_size = 0.33, random_state = 42)

This is my model:

model = Sequential()

model.add(Embedding(max_words, embedding_dim))

model.add(Dropout(0.5))

model.add(LSTM(32, dropout=0.5, recurrent_dropout=0.5))

model.add(Dense(2, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy',metrics=['acc'])

model.fit(x_train, y_train, epochs=10, batch_size=320, class_weight = {0:1 , 1:10})

python tensorflow keras sentiment-analysis

asked Nov 25 '18 at 11:47

BlueMango

256

asked Nov 25 '18 at 11:47

BlueMango

256

asked Nov 25 '18 at 11:47

BlueMango

256

asked Nov 25 '18 at 11:47

BlueMango

256

asked Nov 25 '18 at 11:47

BlueMango

256

Note that one of the things the model learns is 'how probable' a sample is to be 0 or 1, even without seeing the sample. When you use class weights like that you actually change the model.

– Dinari
Nov 25 '18 at 11:57

add a comment |

Note that one of the things the model learns is 'how probable' a sample is to be 0 or 1, even without seeing the sample. When you use class weights like that you actually change the model.

– Dinari
Nov 25 '18 at 11:57

Note that one of the things the model learns is 'how probable' a sample is to be 0 or 1, even without seeing the sample. When you use class weights like that you actually change the model.

– Dinari
Nov 25 '18 at 11:57

add a comment |

2 Answers
2

active

oldest

votes

Of course if you do not balance the loss you'll get better accuracy than if you balance it. Actually this is the reason for balancing. Otherwise the model that predict only positive class for all reviews will give you 90% accuracy. But this model is useless. Accuracy is generally bad metric for such strongly unbalanced datasets. Use F1 instead and you'll see that unbalanced model gives much worse F1.

edited Nov 25 '18 at 15:11

answered Nov 25 '18 at 14:30

Andrey Kite Gorin

16112

add a comment |

Setting higher class weight for a class with lower frequency in a data set is the right approach. More than Accuracy you can look at other more useful metrics like Precision, Recall, F1 score, auc_roc score (Concordance), Confusion matrix to actually understand what the model is learning.

answered Nov 25 '18 at 12:32

AI_Learning

3,3112933

okay but I want to understand about the significantly lower accuracy with class_weight

– BlueMango
Nov 25 '18 at 14:09

Lower accuracy because of multiple reasons. Change your model architecture, data preprocessing, model hyper parameters, etc.

– AI_Learning
Nov 25 '18 at 16:22

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467109%2fsignificantly-lower-accuracy-while-using-class-weight-for-imbalanced-dataset-in%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

edited Nov 25 '18 at 15:11

answered Nov 25 '18 at 14:30

Andrey Kite Gorin

16112

add a comment |

edited Nov 25 '18 at 15:11

answered Nov 25 '18 at 14:30

Andrey Kite Gorin

16112

add a comment |

edited Nov 25 '18 at 15:11

answered Nov 25 '18 at 14:30

Andrey Kite Gorin

16112

edited Nov 25 '18 at 15:11

answered Nov 25 '18 at 14:30

Andrey Kite Gorin

16112

edited Nov 25 '18 at 15:11

answered Nov 25 '18 at 14:30

Andrey Kite Gorin

16112

answered Nov 25 '18 at 14:30

Andrey Kite Gorin

16112

answered Nov 25 '18 at 14:30

Andrey Kite Gorin

16112

add a comment |

answered Nov 25 '18 at 12:32

AI_Learning

3,3112933

okay but I want to understand about the significantly lower accuracy with class_weight

– BlueMango
Nov 25 '18 at 14:09

Lower accuracy because of multiple reasons. Change your model architecture, data preprocessing, model hyper parameters, etc.

– AI_Learning
Nov 25 '18 at 16:22

add a comment |

answered Nov 25 '18 at 12:32

AI_Learning

3,3112933

okay but I want to understand about the significantly lower accuracy with class_weight

– BlueMango
Nov 25 '18 at 14:09

Lower accuracy because of multiple reasons. Change your model architecture, data preprocessing, model hyper parameters, etc.

– AI_Learning
Nov 25 '18 at 16:22

add a comment |

answered Nov 25 '18 at 12:32

AI_Learning

3,3112933

answered Nov 25 '18 at 12:32

AI_Learning

3,3112933

answered Nov 25 '18 at 12:32

AI_Learning

3,3112933

answered Nov 25 '18 at 12:32

AI_Learning

3,3112933

answered Nov 25 '18 at 12:32

AI_Learning

3,3112933

okay but I want to understand about the significantly lower accuracy with class_weight

– BlueMango
Nov 25 '18 at 14:09

Lower accuracy because of multiple reasons. Change your model architecture, data preprocessing, model hyper parameters, etc.

– AI_Learning
Nov 25 '18 at 16:22

add a comment |

okay but I want to understand about the significantly lower accuracy with class_weight

– BlueMango
Nov 25 '18 at 14:09

Lower accuracy because of multiple reasons. Change your model architecture, data preprocessing, model hyper parameters, etc.

– AI_Learning
Nov 25 '18 at 16:22

okay but I want to understand about the significantly lower accuracy with class_weight

– BlueMango
Nov 25 '18 at 14:09

Lower accuracy because of multiple reasons. Change your model architecture, data preprocessing, model hyper parameters, etc.

– AI_Learning
Nov 25 '18 at 16:22

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

6p10HLZ9lD uILU5pWYUrA9pOuCjFr w4 Y8 7FDC5 KyAa9LVemWNONKO

搜尋此網誌

Btukfyl