Housekeeping in git repository











up vote
2
down vote

favorite












I have a very big subversion repository (> 200,000 commits) that we recently migrated to git.



Over the years a lot of people made tiny mistakes like adding iso or msi packages, Adding folders that were not supposed to be added, etc... We cleaned that by removing the files/folders and committed. The repository grew, but that wasn't an issue as SVN did good sparse checkout.



Now on git a client needs to pull the whole history and the local clone is about 50GB now. Time for some housekeeping…



Is there a way to remove all files from history, that have been deleted at some in the past?
Or create a new repo and move all those files over, that are existing in the lastest commit?



I have worked with the git filter-branch command, which helped. But only for those files, that I know the path for.
I also used git log --diff-filter=D --summary to get a list of all deletes, but there are thousands…



OK, in the end, I can simply start a new repository and copy the latest files in it. I will lose the history then, but can keep the original big repo as an archive repo to lookup history when required.



I really hope there are better approaches...










share|improve this question






















  • That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature of git: It allows you to exclude any number of commits from the git clone command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have one git repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.
    – cmaster
    Nov 23 at 15:33















up vote
2
down vote

favorite












I have a very big subversion repository (> 200,000 commits) that we recently migrated to git.



Over the years a lot of people made tiny mistakes like adding iso or msi packages, Adding folders that were not supposed to be added, etc... We cleaned that by removing the files/folders and committed. The repository grew, but that wasn't an issue as SVN did good sparse checkout.



Now on git a client needs to pull the whole history and the local clone is about 50GB now. Time for some housekeeping…



Is there a way to remove all files from history, that have been deleted at some in the past?
Or create a new repo and move all those files over, that are existing in the lastest commit?



I have worked with the git filter-branch command, which helped. But only for those files, that I know the path for.
I also used git log --diff-filter=D --summary to get a list of all deletes, but there are thousands…



OK, in the end, I can simply start a new repository and copy the latest files in it. I will lose the history then, but can keep the original big repo as an archive repo to lookup history when required.



I really hope there are better approaches...










share|improve this question






















  • That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature of git: It allows you to exclude any number of commits from the git clone command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have one git repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.
    – cmaster
    Nov 23 at 15:33













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have a very big subversion repository (> 200,000 commits) that we recently migrated to git.



Over the years a lot of people made tiny mistakes like adding iso or msi packages, Adding folders that were not supposed to be added, etc... We cleaned that by removing the files/folders and committed. The repository grew, but that wasn't an issue as SVN did good sparse checkout.



Now on git a client needs to pull the whole history and the local clone is about 50GB now. Time for some housekeeping…



Is there a way to remove all files from history, that have been deleted at some in the past?
Or create a new repo and move all those files over, that are existing in the lastest commit?



I have worked with the git filter-branch command, which helped. But only for those files, that I know the path for.
I also used git log --diff-filter=D --summary to get a list of all deletes, but there are thousands…



OK, in the end, I can simply start a new repository and copy the latest files in it. I will lose the history then, but can keep the original big repo as an archive repo to lookup history when required.



I really hope there are better approaches...










share|improve this question













I have a very big subversion repository (> 200,000 commits) that we recently migrated to git.



Over the years a lot of people made tiny mistakes like adding iso or msi packages, Adding folders that were not supposed to be added, etc... We cleaned that by removing the files/folders and committed. The repository grew, but that wasn't an issue as SVN did good sparse checkout.



Now on git a client needs to pull the whole history and the local clone is about 50GB now. Time for some housekeeping…



Is there a way to remove all files from history, that have been deleted at some in the past?
Or create a new repo and move all those files over, that are existing in the lastest commit?



I have worked with the git filter-branch command, which helped. But only for those files, that I know the path for.
I also used git log --diff-filter=D --summary to get a list of all deletes, but there are thousands…



OK, in the end, I can simply start a new repository and copy the latest files in it. I will lose the history then, but can keep the original big repo as an archive repo to lookup history when required.



I really hope there are better approaches...







git






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 22 at 11:34









tstrob

111




111












  • That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature of git: It allows you to exclude any number of commits from the git clone command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have one git repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.
    – cmaster
    Nov 23 at 15:33


















  • That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature of git: It allows you to exclude any number of commits from the git clone command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have one git repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.
    – cmaster
    Nov 23 at 15:33
















That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature of git: It allows you to exclude any number of commits from the git clone command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have one git repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.
– cmaster
Nov 23 at 15:33




That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature of git: It allows you to exclude any number of commits from the git clone command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have one git repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.
– cmaster
Nov 23 at 15:33












1 Answer
1






active

oldest

votes

















up vote
0
down vote













I found that it is too difficult to achieve this after migration to git. But I can achieve it before.



I did this:



svnadmin dump …


to create a dump file.



svndumpfilter exclude …


to exclude all stuff I no longer need.
To get a complete list of the repository including deleted items, I did this:



svndumpfilter exclude "*" …


The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.



I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53430073%2fhousekeeping-in-git-repository%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    I found that it is too difficult to achieve this after migration to git. But I can achieve it before.



    I did this:



    svnadmin dump …


    to create a dump file.



    svndumpfilter exclude …


    to exclude all stuff I no longer need.
    To get a complete list of the repository including deleted items, I did this:



    svndumpfilter exclude "*" …


    The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.



    I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.






    share|improve this answer

























      up vote
      0
      down vote













      I found that it is too difficult to achieve this after migration to git. But I can achieve it before.



      I did this:



      svnadmin dump …


      to create a dump file.



      svndumpfilter exclude …


      to exclude all stuff I no longer need.
      To get a complete list of the repository including deleted items, I did this:



      svndumpfilter exclude "*" …


      The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.



      I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        I found that it is too difficult to achieve this after migration to git. But I can achieve it before.



        I did this:



        svnadmin dump …


        to create a dump file.



        svndumpfilter exclude …


        to exclude all stuff I no longer need.
        To get a complete list of the repository including deleted items, I did this:



        svndumpfilter exclude "*" …


        The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.



        I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.






        share|improve this answer












        I found that it is too difficult to achieve this after migration to git. But I can achieve it before.



        I did this:



        svnadmin dump …


        to create a dump file.



        svndumpfilter exclude …


        to exclude all stuff I no longer need.
        To get a complete list of the repository including deleted items, I did this:



        svndumpfilter exclude "*" …


        The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.



        I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 at 15:10









        tstrob

        111




        111






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53430073%2fhousekeeping-in-git-repository%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)