Cleaning up after killing a running docker container











up vote
0
down vote

favorite












My goal is to write a docker image that runs a python script that produces a lot of csv files full of random numbers, which once finished, are to be written to an external storage drive, after which the container quits. Assume that it writes so many of these csv files that they cannot be stored into memory.



What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.



First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.



Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well? Can it be configured to auto quit thereby deleting any garbage that was created?



In case you're curious, the final goal is to use these containers with some sort of orchestration system.










share|improve this question






















  • Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.
    – jww
    Sep 25 at 12:58















up vote
0
down vote

favorite












My goal is to write a docker image that runs a python script that produces a lot of csv files full of random numbers, which once finished, are to be written to an external storage drive, after which the container quits. Assume that it writes so many of these csv files that they cannot be stored into memory.



What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.



First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.



Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well? Can it be configured to auto quit thereby deleting any garbage that was created?



In case you're curious, the final goal is to use these containers with some sort of orchestration system.










share|improve this question






















  • Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.
    – jww
    Sep 25 at 12:58













up vote
0
down vote

favorite









up vote
0
down vote

favorite











My goal is to write a docker image that runs a python script that produces a lot of csv files full of random numbers, which once finished, are to be written to an external storage drive, after which the container quits. Assume that it writes so many of these csv files that they cannot be stored into memory.



What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.



First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.



Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well? Can it be configured to auto quit thereby deleting any garbage that was created?



In case you're curious, the final goal is to use these containers with some sort of orchestration system.










share|improve this question













My goal is to write a docker image that runs a python script that produces a lot of csv files full of random numbers, which once finished, are to be written to an external storage drive, after which the container quits. Assume that it writes so many of these csv files that they cannot be stored into memory.



What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.



First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.



Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well? Can it be configured to auto quit thereby deleting any garbage that was created?



In case you're curious, the final goal is to use these containers with some sort of orchestration system.







linux docker






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Sep 24 at 20:39









Mr. Fegur

3731313




3731313












  • Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.
    – jww
    Sep 25 at 12:58


















  • Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.
    – jww
    Sep 25 at 12:58
















Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.
– jww
Sep 25 at 12:58




Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.
– jww
Sep 25 at 12:58












2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted











What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




Note that you may configure your ENTRYPOINT Python script to automatically perform the necessary cleanup.



To give you some guidelines/examples of that approach:




  • I gave one such example (implemented in Bash, namely with a trap) in this SO answer.

  • Another possible example (implemented in Python) is given in this blog article.


Note that beyond the graceful termination of your containers, you may want to setup a restart policy, such as always or unless-stopped. See for example this codeship blog article.




First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.



Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well?




Albeit the two solutions you present wouldn't be necessary to address the main question of this thread, I have to mention that in general, it is a best practice to use volumes in production, rather than using a mere bind-mount. But of course using any of these two approches (-v volume-name:/path or a bind-mount -v /path:/path) is way better than not using the -v option at all, because I recall that writing data directly in the writable layer of the container implies this data will be lost if the container is recreated from the image.






share|improve this answer






























    up vote
    1
    down vote














    What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




    If you write your intermediate files into the container filesystem, rather than to a persistent volume, then docker can do all the hard work for you. Simply run your container with the remove option (--rm). E.g. if you did:



    docker run --rm -v /path/to/external/storage:/final/result your_image


    Then your application can write to anywhere other than /final/result, and upon exit of the container (successful or any other error condition), the container will be automatically deleted by the docker daemon. On successful completion of your task, write your content to /final/result to be persisted after the container exits. This path is completely made up and you'll likely want to adjust this for your usage.



    Note that if you are running on a desktop environment (mac/windows) and not native linux, then there is an issue with the VM disk expanding with usage and not shrinking as files are deleted. This is the nature of VM filesystems that allocate upon usage and outside of docker's control. In that scenario, you'd likely want the entire setup running with an external volume and configuring your entrypoint to cleanup any temporary files left over from the last run of your container.






    share|improve this answer





















    • The reason I didn't go with this option is because I did not know where docker did its file IO when left to its devices. For example if I have two hard disks, an HDD and an SSD, and I want my python script to do its file IO on the SSD because of speed, then I don't know how to get docker to do that using this method. With volumes, I can specify this fast SSD and have my script use it.
      – Mr. Fegur
      Sep 24 at 22:49






    • 1




      @Mr.Fegur the container files will be stored under /var/lib/docker while the container is running.
      – BMitch
      Sep 25 at 9:22











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52487019%2fcleaning-up-after-killing-a-running-docker-container%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted











    What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




    Note that you may configure your ENTRYPOINT Python script to automatically perform the necessary cleanup.



    To give you some guidelines/examples of that approach:




    • I gave one such example (implemented in Bash, namely with a trap) in this SO answer.

    • Another possible example (implemented in Python) is given in this blog article.


    Note that beyond the graceful termination of your containers, you may want to setup a restart policy, such as always or unless-stopped. See for example this codeship blog article.




    First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.



    Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well?




    Albeit the two solutions you present wouldn't be necessary to address the main question of this thread, I have to mention that in general, it is a best practice to use volumes in production, rather than using a mere bind-mount. But of course using any of these two approches (-v volume-name:/path or a bind-mount -v /path:/path) is way better than not using the -v option at all, because I recall that writing data directly in the writable layer of the container implies this data will be lost if the container is recreated from the image.






    share|improve this answer



























      up vote
      1
      down vote



      accepted











      What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




      Note that you may configure your ENTRYPOINT Python script to automatically perform the necessary cleanup.



      To give you some guidelines/examples of that approach:




      • I gave one such example (implemented in Bash, namely with a trap) in this SO answer.

      • Another possible example (implemented in Python) is given in this blog article.


      Note that beyond the graceful termination of your containers, you may want to setup a restart policy, such as always or unless-stopped. See for example this codeship blog article.




      First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.



      Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well?




      Albeit the two solutions you present wouldn't be necessary to address the main question of this thread, I have to mention that in general, it is a best practice to use volumes in production, rather than using a mere bind-mount. But of course using any of these two approches (-v volume-name:/path or a bind-mount -v /path:/path) is way better than not using the -v option at all, because I recall that writing data directly in the writable layer of the container implies this data will be lost if the container is recreated from the image.






      share|improve this answer

























        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted







        What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




        Note that you may configure your ENTRYPOINT Python script to automatically perform the necessary cleanup.



        To give you some guidelines/examples of that approach:




        • I gave one such example (implemented in Bash, namely with a trap) in this SO answer.

        • Another possible example (implemented in Python) is given in this blog article.


        Note that beyond the graceful termination of your containers, you may want to setup a restart policy, such as always or unless-stopped. See for example this codeship blog article.




        First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.



        Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well?




        Albeit the two solutions you present wouldn't be necessary to address the main question of this thread, I have to mention that in general, it is a best practice to use volumes in production, rather than using a mere bind-mount. But of course using any of these two approches (-v volume-name:/path or a bind-mount -v /path:/path) is way better than not using the -v option at all, because I recall that writing data directly in the writable layer of the container implies this data will be lost if the container is recreated from the image.






        share|improve this answer















        What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




        Note that you may configure your ENTRYPOINT Python script to automatically perform the necessary cleanup.



        To give you some guidelines/examples of that approach:




        • I gave one such example (implemented in Bash, namely with a trap) in this SO answer.

        • Another possible example (implemented in Python) is given in this blog article.


        Note that beyond the graceful termination of your containers, you may want to setup a restart policy, such as always or unless-stopped. See for example this codeship blog article.




        First solution is to mount a fast drive (like an SSD) directly into the container and write to it. After it is done, it transfers the data from this SSD to the external storage drive. The bad thing about this one is that if the container quits unexpectedly, it will leave garbage on the SSD.



        Second solution was to create a volume using the SSD, start a container with this volume, and then pretty much do the same as the first solution. In this case, if the container dies unexpectedly, what happens to the volume? Will it auto quit as well?




        Albeit the two solutions you present wouldn't be necessary to address the main question of this thread, I have to mention that in general, it is a best practice to use volumes in production, rather than using a mere bind-mount. But of course using any of these two approches (-v volume-name:/path or a bind-mount -v /path:/path) is way better than not using the -v option at all, because I recall that writing data directly in the writable layer of the container implies this data will be lost if the container is recreated from the image.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Sep 24 at 22:04

























        answered Sep 24 at 21:58









        ErikMD

        1,8231318




        1,8231318
























            up vote
            1
            down vote














            What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




            If you write your intermediate files into the container filesystem, rather than to a persistent volume, then docker can do all the hard work for you. Simply run your container with the remove option (--rm). E.g. if you did:



            docker run --rm -v /path/to/external/storage:/final/result your_image


            Then your application can write to anywhere other than /final/result, and upon exit of the container (successful or any other error condition), the container will be automatically deleted by the docker daemon. On successful completion of your task, write your content to /final/result to be persisted after the container exits. This path is completely made up and you'll likely want to adjust this for your usage.



            Note that if you are running on a desktop environment (mac/windows) and not native linux, then there is an issue with the VM disk expanding with usage and not shrinking as files are deleted. This is the nature of VM filesystems that allocate upon usage and outside of docker's control. In that scenario, you'd likely want the entire setup running with an external volume and configuring your entrypoint to cleanup any temporary files left over from the last run of your container.






            share|improve this answer





















            • The reason I didn't go with this option is because I did not know where docker did its file IO when left to its devices. For example if I have two hard disks, an HDD and an SSD, and I want my python script to do its file IO on the SSD because of speed, then I don't know how to get docker to do that using this method. With volumes, I can specify this fast SSD and have my script use it.
              – Mr. Fegur
              Sep 24 at 22:49






            • 1




              @Mr.Fegur the container files will be stored under /var/lib/docker while the container is running.
              – BMitch
              Sep 25 at 9:22















            up vote
            1
            down vote














            What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




            If you write your intermediate files into the container filesystem, rather than to a persistent volume, then docker can do all the hard work for you. Simply run your container with the remove option (--rm). E.g. if you did:



            docker run --rm -v /path/to/external/storage:/final/result your_image


            Then your application can write to anywhere other than /final/result, and upon exit of the container (successful or any other error condition), the container will be automatically deleted by the docker daemon. On successful completion of your task, write your content to /final/result to be persisted after the container exits. This path is completely made up and you'll likely want to adjust this for your usage.



            Note that if you are running on a desktop environment (mac/windows) and not native linux, then there is an issue with the VM disk expanding with usage and not shrinking as files are deleted. This is the nature of VM filesystems that allocate upon usage and outside of docker's control. In that scenario, you'd likely want the entire setup running with an external volume and configuring your entrypoint to cleanup any temporary files left over from the last run of your container.






            share|improve this answer





















            • The reason I didn't go with this option is because I did not know where docker did its file IO when left to its devices. For example if I have two hard disks, an HDD and an SSD, and I want my python script to do its file IO on the SSD because of speed, then I don't know how to get docker to do that using this method. With volumes, I can specify this fast SSD and have my script use it.
              – Mr. Fegur
              Sep 24 at 22:49






            • 1




              @Mr.Fegur the container files will be stored under /var/lib/docker while the container is running.
              – BMitch
              Sep 25 at 9:22













            up vote
            1
            down vote










            up vote
            1
            down vote










            What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




            If you write your intermediate files into the container filesystem, rather than to a persistent volume, then docker can do all the hard work for you. Simply run your container with the remove option (--rm). E.g. if you did:



            docker run --rm -v /path/to/external/storage:/final/result your_image


            Then your application can write to anywhere other than /final/result, and upon exit of the container (successful or any other error condition), the container will be automatically deleted by the docker daemon. On successful completion of your task, write your content to /final/result to be persisted after the container exits. This path is completely made up and you'll likely want to adjust this for your usage.



            Note that if you are running on a desktop environment (mac/windows) and not native linux, then there is an issue with the VM disk expanding with usage and not shrinking as files are deleted. This is the nature of VM filesystems that allocate upon usage and outside of docker's control. In that scenario, you'd likely want the entire setup running with an external volume and configuring your entrypoint to cleanup any temporary files left over from the last run of your container.






            share|improve this answer













            What I'm worried about are the cases where the container encounters an error and quits (or is made to quit by the user), and then it creates a bunch of garbage files that have to be manually cleaned.




            If you write your intermediate files into the container filesystem, rather than to a persistent volume, then docker can do all the hard work for you. Simply run your container with the remove option (--rm). E.g. if you did:



            docker run --rm -v /path/to/external/storage:/final/result your_image


            Then your application can write to anywhere other than /final/result, and upon exit of the container (successful or any other error condition), the container will be automatically deleted by the docker daemon. On successful completion of your task, write your content to /final/result to be persisted after the container exits. This path is completely made up and you'll likely want to adjust this for your usage.



            Note that if you are running on a desktop environment (mac/windows) and not native linux, then there is an issue with the VM disk expanding with usage and not shrinking as files are deleted. This is the nature of VM filesystems that allocate upon usage and outside of docker's control. In that scenario, you'd likely want the entire setup running with an external volume and configuring your entrypoint to cleanup any temporary files left over from the last run of your container.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Sep 24 at 22:45









            BMitch

            54.5k9110127




            54.5k9110127












            • The reason I didn't go with this option is because I did not know where docker did its file IO when left to its devices. For example if I have two hard disks, an HDD and an SSD, and I want my python script to do its file IO on the SSD because of speed, then I don't know how to get docker to do that using this method. With volumes, I can specify this fast SSD and have my script use it.
              – Mr. Fegur
              Sep 24 at 22:49






            • 1




              @Mr.Fegur the container files will be stored under /var/lib/docker while the container is running.
              – BMitch
              Sep 25 at 9:22


















            • The reason I didn't go with this option is because I did not know where docker did its file IO when left to its devices. For example if I have two hard disks, an HDD and an SSD, and I want my python script to do its file IO on the SSD because of speed, then I don't know how to get docker to do that using this method. With volumes, I can specify this fast SSD and have my script use it.
              – Mr. Fegur
              Sep 24 at 22:49






            • 1




              @Mr.Fegur the container files will be stored under /var/lib/docker while the container is running.
              – BMitch
              Sep 25 at 9:22
















            The reason I didn't go with this option is because I did not know where docker did its file IO when left to its devices. For example if I have two hard disks, an HDD and an SSD, and I want my python script to do its file IO on the SSD because of speed, then I don't know how to get docker to do that using this method. With volumes, I can specify this fast SSD and have my script use it.
            – Mr. Fegur
            Sep 24 at 22:49




            The reason I didn't go with this option is because I did not know where docker did its file IO when left to its devices. For example if I have two hard disks, an HDD and an SSD, and I want my python script to do its file IO on the SSD because of speed, then I don't know how to get docker to do that using this method. With volumes, I can specify this fast SSD and have my script use it.
            – Mr. Fegur
            Sep 24 at 22:49




            1




            1




            @Mr.Fegur the container files will be stored under /var/lib/docker while the container is running.
            – BMitch
            Sep 25 at 9:22




            @Mr.Fegur the container files will be stored under /var/lib/docker while the container is running.
            – BMitch
            Sep 25 at 9:22


















             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52487019%2fcleaning-up-after-killing-a-running-docker-container%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)