ensemble model in case of outliers

up vote
0
down vote

favorite

I am working on predictive(linear regression) modeling technique but my target variable has huge amount of outliers let say 30-40% data is outliers so I want to know whether is it a good idea to go for ensemble model, I mean:-
1. build one model for non-outlier data
2. build another model for outlier data
3. And then predict using average prediction from both the model(as we do in ensemble modeling)

Note: - After transformation also outlier exists - so this is also not a feasible option as per my research activities.

cannot share data for security reasons.

I did try to find solution(suggestions) on many discuss group but could not reach to any fruitful conclusion.

edited Nov 22 at 4:20

asked Nov 21 at 12:25

Abhishek

voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50

1

I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58

Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07

1

I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48

Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25

add a comment |

up vote
0
down vote

favorite

Note: - After transformation also outlier exists - so this is also not a feasible option as per my research activities.

cannot share data for security reasons.

I did try to find solution(suggestions) on many discuss group but could not reach to any fruitful conclusion.

edited Nov 22 at 4:20

asked Nov 21 at 12:25

Abhishek

voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50

1

I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58

Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07

1

I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48

Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25

add a comment |

up vote
0
down vote

favorite

Note: - After transformation also outlier exists - so this is also not a feasible option as per my research activities.

cannot share data for security reasons.

I did try to find solution(suggestions) on many discuss group but could not reach to any fruitful conclusion.

edited Nov 22 at 4:20

asked Nov 21 at 12:25

Abhishek

Note: - After transformation also outlier exists - so this is also not a feasible option as per my research activities.

cannot share data for security reasons.

I did try to find solution(suggestions) on many discuss group but could not reach to any fruitful conclusion.

machine-learning statistics linear-regression data-science outliers

edited Nov 22 at 4:20

asked Nov 21 at 12:25

Abhishek

edited Nov 22 at 4:20

asked Nov 21 at 12:25

Abhishek

edited Nov 22 at 4:20

asked Nov 21 at 12:25

Abhishek

asked Nov 21 at 12:25

Abhishek

asked Nov 21 at 12:25

Abhishek

voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50

1

I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58

Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07

1

I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48

Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25

add a comment |

voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50

1

I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58

Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07

1

I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48

Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25

voted negative - reason please so that I can improve on that.
– Abhishek
Nov 21 at 12:50

I didn't downvote, but it seems that you don't have a programming question here. Maybe ask in the Stats or Data Science SE.
– Matias Valdenegro
Nov 21 at 15:58

Thanks @Matias, will keep in mind
– Abhishek
Nov 21 at 16:07

I think making separate models for different effects is a good idea, but my advice is to not think about "normal" effects versus "outliers". My advice is to think about the different ways that observable data may be generated; there may be any number of ways. Build a model that expresses what you know about each data generating mechanism, and then train the whole collection at the same time via EM or whatever, i.e. my advice is, don't filter out "outliers" and then train the "normal" model on the leftovers. Good luck, this is a good problem. Also stats.stackexchange.com will have more to say.
– Robert Dodier
Nov 21 at 17:48

Thanks @RobertDodier I did build a single model(Linear regression specifically) for the whole data but I have not reached to any conclusion as of now so thought to give it a try to build separate models. I will be going forward with the separate model approach and if I will get something fruitful out of it, I will share with all.
– Abhishek
Nov 22 at 4:25

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411995%2fensemble-model-in-case-of-outliers%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

odI0O1MTeO sorbP,Vgq1AE6PUYENdNgcLOUxJd7 lXRoXnA1Bo XU8dlgob1adeuv0,o2DiT 1ieiM 1handwVhD PDl1,l

搜尋此網誌

Btukfyl