Need help troubleshooting a .NET Core 2.1 API in a linux Docker












12















We have a bad situation with an API we are running in a linux Docker on AWS ECS. The API is running with ASP.NET Core 2.1 now, but we also had the problem on ASP.NET 2.0 (we hoped upgrading to 2.1 would fix it, but it didn't).



The problem: Containers are frequently killed with the exit code 139. From what I can gather in my research so far, this means a SIGSEGV fault or segmentation fault. Typically thrown if the application is trying to access a part of the memory that it does not have permission to access.



I would not expect such a thing to happen with managed code, but it might be a library or lower level function in the framework that triggers this.



We have middleware configured for logging unhandled exceptions in the API, but we do not get any logs when this happens. This means we don't have a lot to go on to troubleshoot this.



I know there is not a lot to go on here, so I am basically looking for ways to get some idea of what the problem might be.



Maybe if I could make a memory dump at the time it crashes? - or somehow get more details from Docker or ECS?



Any advise is greatly apreciated!



UPDATE



One of the site reliability engineers here were able to do some more analysis on this. He has identified two types of segfaults that kill the containers:



ip-10-50-128-175 kernel: [336491.431816] traps: dotnet[14200] general protection ip:7f7e14fc2529 sp:7f7b41ff8080 error:0 in libc-2.24.so[7f7e14f8e000+195000]



ip-10-50-128-219 kernel: [481011.825532] dotnet[31035]: segfault at 0 ip (null) sp 00007f50897f7658 error 14 in dotnet[400000+18000]



I am not sure what this means though, but thought I would put it here in case someone gets a hint



UPDATE 2



So, we were not able to determine the root cause of the issue yet, but we mitigated the crashing API by stopping one of our internal services from calling one of the endpoints in large volumes. We basically duplicated the logic in the internal service to test if the crashes stopped, and they did stop.
This is not a very satisfactory solution, and it won't really help anyone else experiencing this issue, but at least our API was stable throughout Black Friday and Cyber Monday :)










share|improve this question

























  • If you depend on any unmanaged code, look there first.

    – spender
    Oct 17 '18 at 14:16






  • 2





    Can you confirm this doesn't happen when your code is outside of docker, so that you can identify if it is a docker issue or something has to do with your code?

    – Yamaç Kurtuluş
    Nov 13 '18 at 12:49











  • have you tried logging trace and see what exactly happening before logging stops?

    – Yahya Hussein
    Nov 13 '18 at 13:00











  • How about enabling storage of crashdumps docs.microsoft.com/en-us/windows/desktop/wer/… ? If you get those, then you can open them in VS and debug. But you should have debug info enabled, to see where your program is.

    – MiroJanosik
    Nov 13 '18 at 13:55











  • Do you have any application level logging to hint what it's doing at time of crash?

    – Clayton Harbich
    Nov 14 '18 at 16:31
















12















We have a bad situation with an API we are running in a linux Docker on AWS ECS. The API is running with ASP.NET Core 2.1 now, but we also had the problem on ASP.NET 2.0 (we hoped upgrading to 2.1 would fix it, but it didn't).



The problem: Containers are frequently killed with the exit code 139. From what I can gather in my research so far, this means a SIGSEGV fault or segmentation fault. Typically thrown if the application is trying to access a part of the memory that it does not have permission to access.



I would not expect such a thing to happen with managed code, but it might be a library or lower level function in the framework that triggers this.



We have middleware configured for logging unhandled exceptions in the API, but we do not get any logs when this happens. This means we don't have a lot to go on to troubleshoot this.



I know there is not a lot to go on here, so I am basically looking for ways to get some idea of what the problem might be.



Maybe if I could make a memory dump at the time it crashes? - or somehow get more details from Docker or ECS?



Any advise is greatly apreciated!



UPDATE



One of the site reliability engineers here were able to do some more analysis on this. He has identified two types of segfaults that kill the containers:



ip-10-50-128-175 kernel: [336491.431816] traps: dotnet[14200] general protection ip:7f7e14fc2529 sp:7f7b41ff8080 error:0 in libc-2.24.so[7f7e14f8e000+195000]



ip-10-50-128-219 kernel: [481011.825532] dotnet[31035]: segfault at 0 ip (null) sp 00007f50897f7658 error 14 in dotnet[400000+18000]



I am not sure what this means though, but thought I would put it here in case someone gets a hint



UPDATE 2



So, we were not able to determine the root cause of the issue yet, but we mitigated the crashing API by stopping one of our internal services from calling one of the endpoints in large volumes. We basically duplicated the logic in the internal service to test if the crashes stopped, and they did stop.
This is not a very satisfactory solution, and it won't really help anyone else experiencing this issue, but at least our API was stable throughout Black Friday and Cyber Monday :)










share|improve this question

























  • If you depend on any unmanaged code, look there first.

    – spender
    Oct 17 '18 at 14:16






  • 2





    Can you confirm this doesn't happen when your code is outside of docker, so that you can identify if it is a docker issue or something has to do with your code?

    – Yamaç Kurtuluş
    Nov 13 '18 at 12:49











  • have you tried logging trace and see what exactly happening before logging stops?

    – Yahya Hussein
    Nov 13 '18 at 13:00











  • How about enabling storage of crashdumps docs.microsoft.com/en-us/windows/desktop/wer/… ? If you get those, then you can open them in VS and debug. But you should have debug info enabled, to see where your program is.

    – MiroJanosik
    Nov 13 '18 at 13:55











  • Do you have any application level logging to hint what it's doing at time of crash?

    – Clayton Harbich
    Nov 14 '18 at 16:31














12












12








12


3






We have a bad situation with an API we are running in a linux Docker on AWS ECS. The API is running with ASP.NET Core 2.1 now, but we also had the problem on ASP.NET 2.0 (we hoped upgrading to 2.1 would fix it, but it didn't).



The problem: Containers are frequently killed with the exit code 139. From what I can gather in my research so far, this means a SIGSEGV fault or segmentation fault. Typically thrown if the application is trying to access a part of the memory that it does not have permission to access.



I would not expect such a thing to happen with managed code, but it might be a library or lower level function in the framework that triggers this.



We have middleware configured for logging unhandled exceptions in the API, but we do not get any logs when this happens. This means we don't have a lot to go on to troubleshoot this.



I know there is not a lot to go on here, so I am basically looking for ways to get some idea of what the problem might be.



Maybe if I could make a memory dump at the time it crashes? - or somehow get more details from Docker or ECS?



Any advise is greatly apreciated!



UPDATE



One of the site reliability engineers here were able to do some more analysis on this. He has identified two types of segfaults that kill the containers:



ip-10-50-128-175 kernel: [336491.431816] traps: dotnet[14200] general protection ip:7f7e14fc2529 sp:7f7b41ff8080 error:0 in libc-2.24.so[7f7e14f8e000+195000]



ip-10-50-128-219 kernel: [481011.825532] dotnet[31035]: segfault at 0 ip (null) sp 00007f50897f7658 error 14 in dotnet[400000+18000]



I am not sure what this means though, but thought I would put it here in case someone gets a hint



UPDATE 2



So, we were not able to determine the root cause of the issue yet, but we mitigated the crashing API by stopping one of our internal services from calling one of the endpoints in large volumes. We basically duplicated the logic in the internal service to test if the crashes stopped, and they did stop.
This is not a very satisfactory solution, and it won't really help anyone else experiencing this issue, but at least our API was stable throughout Black Friday and Cyber Monday :)










share|improve this question
















We have a bad situation with an API we are running in a linux Docker on AWS ECS. The API is running with ASP.NET Core 2.1 now, but we also had the problem on ASP.NET 2.0 (we hoped upgrading to 2.1 would fix it, but it didn't).



The problem: Containers are frequently killed with the exit code 139. From what I can gather in my research so far, this means a SIGSEGV fault or segmentation fault. Typically thrown if the application is trying to access a part of the memory that it does not have permission to access.



I would not expect such a thing to happen with managed code, but it might be a library or lower level function in the framework that triggers this.



We have middleware configured for logging unhandled exceptions in the API, but we do not get any logs when this happens. This means we don't have a lot to go on to troubleshoot this.



I know there is not a lot to go on here, so I am basically looking for ways to get some idea of what the problem might be.



Maybe if I could make a memory dump at the time it crashes? - or somehow get more details from Docker or ECS?



Any advise is greatly apreciated!



UPDATE



One of the site reliability engineers here were able to do some more analysis on this. He has identified two types of segfaults that kill the containers:



ip-10-50-128-175 kernel: [336491.431816] traps: dotnet[14200] general protection ip:7f7e14fc2529 sp:7f7b41ff8080 error:0 in libc-2.24.so[7f7e14f8e000+195000]



ip-10-50-128-219 kernel: [481011.825532] dotnet[31035]: segfault at 0 ip (null) sp 00007f50897f7658 error 14 in dotnet[400000+18000]



I am not sure what this means though, but thought I would put it here in case someone gets a hint



UPDATE 2



So, we were not able to determine the root cause of the issue yet, but we mitigated the crashing API by stopping one of our internal services from calling one of the endpoints in large volumes. We basically duplicated the logic in the internal service to test if the crashes stopped, and they did stop.
This is not a very satisfactory solution, and it won't really help anyone else experiencing this issue, but at least our API was stable throughout Black Friday and Cyber Monday :)







c# docker asp.net-core amazon-ecs






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 '18 at 17:24







Søren Pedersen

















asked Oct 17 '18 at 14:06









Søren PedersenSøren Pedersen

323416




323416













  • If you depend on any unmanaged code, look there first.

    – spender
    Oct 17 '18 at 14:16






  • 2





    Can you confirm this doesn't happen when your code is outside of docker, so that you can identify if it is a docker issue or something has to do with your code?

    – Yamaç Kurtuluş
    Nov 13 '18 at 12:49











  • have you tried logging trace and see what exactly happening before logging stops?

    – Yahya Hussein
    Nov 13 '18 at 13:00











  • How about enabling storage of crashdumps docs.microsoft.com/en-us/windows/desktop/wer/… ? If you get those, then you can open them in VS and debug. But you should have debug info enabled, to see where your program is.

    – MiroJanosik
    Nov 13 '18 at 13:55











  • Do you have any application level logging to hint what it's doing at time of crash?

    – Clayton Harbich
    Nov 14 '18 at 16:31



















  • If you depend on any unmanaged code, look there first.

    – spender
    Oct 17 '18 at 14:16






  • 2





    Can you confirm this doesn't happen when your code is outside of docker, so that you can identify if it is a docker issue or something has to do with your code?

    – Yamaç Kurtuluş
    Nov 13 '18 at 12:49











  • have you tried logging trace and see what exactly happening before logging stops?

    – Yahya Hussein
    Nov 13 '18 at 13:00











  • How about enabling storage of crashdumps docs.microsoft.com/en-us/windows/desktop/wer/… ? If you get those, then you can open them in VS and debug. But you should have debug info enabled, to see where your program is.

    – MiroJanosik
    Nov 13 '18 at 13:55











  • Do you have any application level logging to hint what it's doing at time of crash?

    – Clayton Harbich
    Nov 14 '18 at 16:31

















If you depend on any unmanaged code, look there first.

– spender
Oct 17 '18 at 14:16





If you depend on any unmanaged code, look there first.

– spender
Oct 17 '18 at 14:16




2




2





Can you confirm this doesn't happen when your code is outside of docker, so that you can identify if it is a docker issue or something has to do with your code?

– Yamaç Kurtuluş
Nov 13 '18 at 12:49





Can you confirm this doesn't happen when your code is outside of docker, so that you can identify if it is a docker issue or something has to do with your code?

– Yamaç Kurtuluş
Nov 13 '18 at 12:49













have you tried logging trace and see what exactly happening before logging stops?

– Yahya Hussein
Nov 13 '18 at 13:00





have you tried logging trace and see what exactly happening before logging stops?

– Yahya Hussein
Nov 13 '18 at 13:00













How about enabling storage of crashdumps docs.microsoft.com/en-us/windows/desktop/wer/… ? If you get those, then you can open them in VS and debug. But you should have debug info enabled, to see where your program is.

– MiroJanosik
Nov 13 '18 at 13:55





How about enabling storage of crashdumps docs.microsoft.com/en-us/windows/desktop/wer/… ? If you get those, then you can open them in VS and debug. But you should have debug info enabled, to see where your program is.

– MiroJanosik
Nov 13 '18 at 13:55













Do you have any application level logging to hint what it's doing at time of crash?

– Clayton Harbich
Nov 14 '18 at 16:31





Do you have any application level logging to hint what it's doing at time of crash?

– Clayton Harbich
Nov 14 '18 at 16:31












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52856863%2fneed-help-troubleshooting-a-net-core-2-1-api-in-a-linux-docker%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52856863%2fneed-help-troubleshooting-a-net-core-2-1-api-in-a-linux-docker%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)