ensemble model in case of outliers
up vote
0
down vote
favorite
I am working on predictive(linear regression) modeling technique but my target variable has huge amount of outliers let say 30-40% data is outliers so I want to know whether is it a good idea to go for ensemble model, I mean:-
1. build one model for non-outlier data
2. build another model for outlier data
3. And then predict using average prediction from both the model(as we do in ensemble modeling)
Note: - After transformation also outlier exists - so this is also not a feasible option as per my research activities.
cannot share data for security reasons.
I did try to find solution(suggestions) on many discuss group but could not reach to any fruitful conclusion.
machine-learning statistics linear-regression data-science outliers
add a comment |
up vote
0
down vote
favorite
I am working on predictive(linear regression) modeling technique but my target variable has huge amount of outliers let say 30-40% data is outliers so I want to know whether is it a good idea to go for ensemble model, I mean:-
1. build one model for non-outlier data
2. build another model for outlier data
3. And then predict using average prediction from both the model(as we do in ensemble modeling)
Note: - After transformation also outlier exists - so this is also not a feasible option as per my research activities.
cannot share data for security reasons.
I did try to find solution(suggestions) on many discuss group but could not reach to any fruitful conclusion.
machine-learning statistics linear-regression data-science outliers
voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50
1
I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58
Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07
1
I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48
Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am working on predictive(linear regression) modeling technique but my target variable has huge amount of outliers let say 30-40% data is outliers so I want to know whether is it a good idea to go for ensemble model, I mean:-
1. build one model for non-outlier data
2. build another model for outlier data
3. And then predict using average prediction from both the model(as we do in ensemble modeling)
Note: - After transformation also outlier exists - so this is also not a feasible option as per my research activities.
cannot share data for security reasons.
I did try to find solution(suggestions) on many discuss group but could not reach to any fruitful conclusion.
machine-learning statistics linear-regression data-science outliers
I am working on predictive(linear regression) modeling technique but my target variable has huge amount of outliers let say 30-40% data is outliers so I want to know whether is it a good idea to go for ensemble model, I mean:-
1. build one model for non-outlier data
2. build another model for outlier data
3. And then predict using average prediction from both the model(as we do in ensemble modeling)
Note: - After transformation also outlier exists - so this is also not a feasible option as per my research activities.
cannot share data for security reasons.
I did try to find solution(suggestions) on many discuss group but could not reach to any fruitful conclusion.
machine-learning statistics linear-regression data-science outliers
machine-learning statistics linear-regression data-science outliers
edited Nov 22 at 4:20
asked Nov 21 at 12:25
Abhishek
63
63
voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50
1
I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58
Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07
1
I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48
Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25
add a comment |
voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50
1
I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58
Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07
1
I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48
Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25
voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50
voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50
1
1
I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58
I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58
Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07
Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07
1
1
I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48
I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48
Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25
Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411995%2fensemble-model-in-case-of-outliers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50
1
I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58
Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07
1
I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48
Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25