Housekeeping in git repository
up vote
2
down vote
favorite
I have a very big subversion repository (> 200,000 commits) that we recently migrated to git.
Over the years a lot of people made tiny mistakes like adding iso or msi packages, Adding folders that were not supposed to be added, etc... We cleaned that by removing the files/folders and committed. The repository grew, but that wasn't an issue as SVN did good sparse checkout.
Now on git a client needs to pull the whole history and the local clone is about 50GB now. Time for some housekeeping…
Is there a way to remove all files from history, that have been deleted at some in the past?
Or create a new repo and move all those files over, that are existing in the lastest commit?
I have worked with the git filter-branch
command, which helped. But only for those files, that I know the path for.
I also used git log --diff-filter=D --summary
to get a list of all deletes, but there are thousands…
OK, in the end, I can simply start a new repository and copy the latest files in it. I will lose the history then, but can keep the original big repo as an archive repo to lookup history when required.
I really hope there are better approaches...
git
add a comment |
up vote
2
down vote
favorite
I have a very big subversion repository (> 200,000 commits) that we recently migrated to git.
Over the years a lot of people made tiny mistakes like adding iso or msi packages, Adding folders that were not supposed to be added, etc... We cleaned that by removing the files/folders and committed. The repository grew, but that wasn't an issue as SVN did good sparse checkout.
Now on git a client needs to pull the whole history and the local clone is about 50GB now. Time for some housekeeping…
Is there a way to remove all files from history, that have been deleted at some in the past?
Or create a new repo and move all those files over, that are existing in the lastest commit?
I have worked with the git filter-branch
command, which helped. But only for those files, that I know the path for.
I also used git log --diff-filter=D --summary
to get a list of all deletes, but there are thousands…
OK, in the end, I can simply start a new repository and copy the latest files in it. I will lose the history then, but can keep the original big repo as an archive repo to lookup history when required.
I really hope there are better approaches...
git
That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature ofgit
: It allows you to exclude any number of commits from thegit clone
command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have onegit
repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.
– cmaster
Nov 23 at 15:33
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a very big subversion repository (> 200,000 commits) that we recently migrated to git.
Over the years a lot of people made tiny mistakes like adding iso or msi packages, Adding folders that were not supposed to be added, etc... We cleaned that by removing the files/folders and committed. The repository grew, but that wasn't an issue as SVN did good sparse checkout.
Now on git a client needs to pull the whole history and the local clone is about 50GB now. Time for some housekeeping…
Is there a way to remove all files from history, that have been deleted at some in the past?
Or create a new repo and move all those files over, that are existing in the lastest commit?
I have worked with the git filter-branch
command, which helped. But only for those files, that I know the path for.
I also used git log --diff-filter=D --summary
to get a list of all deletes, but there are thousands…
OK, in the end, I can simply start a new repository and copy the latest files in it. I will lose the history then, but can keep the original big repo as an archive repo to lookup history when required.
I really hope there are better approaches...
git
I have a very big subversion repository (> 200,000 commits) that we recently migrated to git.
Over the years a lot of people made tiny mistakes like adding iso or msi packages, Adding folders that were not supposed to be added, etc... We cleaned that by removing the files/folders and committed. The repository grew, but that wasn't an issue as SVN did good sparse checkout.
Now on git a client needs to pull the whole history and the local clone is about 50GB now. Time for some housekeeping…
Is there a way to remove all files from history, that have been deleted at some in the past?
Or create a new repo and move all those files over, that are existing in the lastest commit?
I have worked with the git filter-branch
command, which helped. But only for those files, that I know the path for.
I also used git log --diff-filter=D --summary
to get a list of all deletes, but there are thousands…
OK, in the end, I can simply start a new repository and copy the latest files in it. I will lose the history then, but can keep the original big repo as an archive repo to lookup history when required.
I really hope there are better approaches...
git
git
asked Nov 22 at 11:34
tstrob
111
111
That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature ofgit
: It allows you to exclude any number of commits from thegit clone
command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have onegit
repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.
– cmaster
Nov 23 at 15:33
add a comment |
That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature ofgit
: It allows you to exclude any number of commits from thegit clone
command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have onegit
repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.
– cmaster
Nov 23 at 15:33
That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature of
git
: It allows you to exclude any number of commits from the git clone
command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have one git
repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.– cmaster
Nov 23 at 15:33
That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature of
git
: It allows you to exclude any number of commits from the git clone
command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have one git
repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.– cmaster
Nov 23 at 15:33
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
I found that it is too difficult to achieve this after migration to git. But I can achieve it before.
I did this:
svnadmin dump …
to create a dump file.
svndumpfilter exclude …
to exclude all stuff I no longer need.
To get a complete list of the repository including deleted items, I did this:
svndumpfilter exclude "*" …
The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.
I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53430073%2fhousekeeping-in-git-repository%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I found that it is too difficult to achieve this after migration to git. But I can achieve it before.
I did this:
svnadmin dump …
to create a dump file.
svndumpfilter exclude …
to exclude all stuff I no longer need.
To get a complete list of the repository including deleted items, I did this:
svndumpfilter exclude "*" …
The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.
I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.
add a comment |
up vote
0
down vote
I found that it is too difficult to achieve this after migration to git. But I can achieve it before.
I did this:
svnadmin dump …
to create a dump file.
svndumpfilter exclude …
to exclude all stuff I no longer need.
To get a complete list of the repository including deleted items, I did this:
svndumpfilter exclude "*" …
The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.
I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.
add a comment |
up vote
0
down vote
up vote
0
down vote
I found that it is too difficult to achieve this after migration to git. But I can achieve it before.
I did this:
svnadmin dump …
to create a dump file.
svndumpfilter exclude …
to exclude all stuff I no longer need.
To get a complete list of the repository including deleted items, I did this:
svndumpfilter exclude "*" …
The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.
I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.
I found that it is too difficult to achieve this after migration to git. But I can achieve it before.
I did this:
svnadmin dump …
to create a dump file.
svndumpfilter exclude …
to exclude all stuff I no longer need.
To get a complete list of the repository including deleted items, I did this:
svndumpfilter exclude "*" …
The cool thing is, that svndumpfilter lists out all files it excluded in a structured sorted output. Since I excluded everything, I got a complete directory.
I ran svndumpfilter a couple of times to remove all unwanted stuff. Then reimported the subversion repository as a new repository and then used this to migrate to git.
answered Nov 23 at 15:10
tstrob
111
111
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53430073%2fhousekeeping-in-git-repository%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
That's a tough problem since, well, those files are part of the history. Any method for excluding these files must therefore rewrite history. That said, maybe you'll want to take a look at the shallow clone feature of
git
: It allows you to exclude any number of commits from thegit clone
command. This effectively prunes the commit DAG at the places that you specify. I guess it should be possible to have onegit
repo with the full history, and a shallow clone of that which excludes your past sins, where the later is used for new development while the former is used for archeology, only.– cmaster
Nov 23 at 15:33