Delete words from text file if they exist in another textfile
I have created one txt.file, consisting of five other text files (all text.txt). I also have a text file with words on each line (remove words.txt). I would like to remove the words from removewords.txt from alltext.txt, without creating a new textfile and without writing the words from removewords.txt manually.
I have thought about using sets, but is a but confused how to approach this?
My mergin of files looks like this:
files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:
for file in files:
with open (file) as complete_file:
for line in complete_file:
fout.write(line)
Any suggestions? Thank you very much
python
add a comment |
I have created one txt.file, consisting of five other text files (all text.txt). I also have a text file with words on each line (remove words.txt). I would like to remove the words from removewords.txt from alltext.txt, without creating a new textfile and without writing the words from removewords.txt manually.
I have thought about using sets, but is a but confused how to approach this?
My mergin of files looks like this:
files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:
for file in files:
with open (file) as complete_file:
for line in complete_file:
fout.write(line)
Any suggestions? Thank you very much
python
add a comment |
I have created one txt.file, consisting of five other text files (all text.txt). I also have a text file with words on each line (remove words.txt). I would like to remove the words from removewords.txt from alltext.txt, without creating a new textfile and without writing the words from removewords.txt manually.
I have thought about using sets, but is a but confused how to approach this?
My mergin of files looks like this:
files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:
for file in files:
with open (file) as complete_file:
for line in complete_file:
fout.write(line)
Any suggestions? Thank you very much
python
I have created one txt.file, consisting of five other text files (all text.txt). I also have a text file with words on each line (remove words.txt). I would like to remove the words from removewords.txt from alltext.txt, without creating a new textfile and without writing the words from removewords.txt manually.
I have thought about using sets, but is a but confused how to approach this?
My mergin of files looks like this:
files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:
for file in files:
with open (file) as complete_file:
for line in complete_file:
fout.write(line)
Any suggestions? Thank you very much
python
python
asked Nov 28 '18 at 14:21
Madman12Madman12
173
173
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
I would do the following:
- read all words from "removewords.txt" into a list called
remove_words
- read all words from "alltext.txt" into a list called
all_words
- open the file "alltext.txt" in write mode (
"w"
) and write content to it as follows:
- for each word in
all_words
, check if that word is in the listremove_words
. If it is not, write it to "alltext.txt"
- for each word in
Are these steps detailed enough so that you can solve your problem?
If not, comment below on what you are having problems with.
Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?
– Madman12
Nov 28 '18 at 16:50
A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).
– Felix
Nov 29 '18 at 13:59
Is performance really a problem for your program? How large are your text files?
– Felix
Nov 29 '18 at 14:00
add a comment |
If it is not a problem you can load all the words to remove in to a set using split
, then check each word before you write it to the output file.
Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " "
to separate each word from other words.
rm_word_file = open('removewords.txt', 'r')
remove_words = set(rm_word_file.read().split(" "))
rm_word_file.close()
files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:
for file in files:
with open (file) as complete_file:
for line in complete_file:
for word in line.split(" "):
if(word not in remove_words):
fout.write(line)
Something else to think about is, if there is punctuation in your text body, how you will handle that?
You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.
There's an indentation error and you should close "removewords.txt".
– Felix
Nov 28 '18 at 14:36
This is not working for me. The text repeats itself several times :(
– Madman12
Nov 28 '18 at 16:50
Can you describe the error/issue you are having in more depth?
– Adam Dadvar
Nov 28 '18 at 16:59
Each text piece - separated by a newline - is printed five times in the output file
– Madman12
Nov 28 '18 at 18:52
That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.
– Adam Dadvar
Nov 29 '18 at 11:31
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53521590%2fdelete-words-from-text-file-if-they-exist-in-another-textfile%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I would do the following:
- read all words from "removewords.txt" into a list called
remove_words
- read all words from "alltext.txt" into a list called
all_words
- open the file "alltext.txt" in write mode (
"w"
) and write content to it as follows:
- for each word in
all_words
, check if that word is in the listremove_words
. If it is not, write it to "alltext.txt"
- for each word in
Are these steps detailed enough so that you can solve your problem?
If not, comment below on what you are having problems with.
Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?
– Madman12
Nov 28 '18 at 16:50
A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).
– Felix
Nov 29 '18 at 13:59
Is performance really a problem for your program? How large are your text files?
– Felix
Nov 29 '18 at 14:00
add a comment |
I would do the following:
- read all words from "removewords.txt" into a list called
remove_words
- read all words from "alltext.txt" into a list called
all_words
- open the file "alltext.txt" in write mode (
"w"
) and write content to it as follows:
- for each word in
all_words
, check if that word is in the listremove_words
. If it is not, write it to "alltext.txt"
- for each word in
Are these steps detailed enough so that you can solve your problem?
If not, comment below on what you are having problems with.
Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?
– Madman12
Nov 28 '18 at 16:50
A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).
– Felix
Nov 29 '18 at 13:59
Is performance really a problem for your program? How large are your text files?
– Felix
Nov 29 '18 at 14:00
add a comment |
I would do the following:
- read all words from "removewords.txt" into a list called
remove_words
- read all words from "alltext.txt" into a list called
all_words
- open the file "alltext.txt" in write mode (
"w"
) and write content to it as follows:
- for each word in
all_words
, check if that word is in the listremove_words
. If it is not, write it to "alltext.txt"
- for each word in
Are these steps detailed enough so that you can solve your problem?
If not, comment below on what you are having problems with.
I would do the following:
- read all words from "removewords.txt" into a list called
remove_words
- read all words from "alltext.txt" into a list called
all_words
- open the file "alltext.txt" in write mode (
"w"
) and write content to it as follows:
- for each word in
all_words
, check if that word is in the listremove_words
. If it is not, write it to "alltext.txt"
- for each word in
Are these steps detailed enough so that you can solve your problem?
If not, comment below on what you are having problems with.
answered Nov 28 '18 at 14:32
FelixFelix
3,2212930
3,2212930
Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?
– Madman12
Nov 28 '18 at 16:50
A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).
– Felix
Nov 29 '18 at 13:59
Is performance really a problem for your program? How large are your text files?
– Felix
Nov 29 '18 at 14:00
add a comment |
Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?
– Madman12
Nov 28 '18 at 16:50
A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).
– Felix
Nov 29 '18 at 13:59
Is performance really a problem for your program? How large are your text files?
– Felix
Nov 29 '18 at 14:00
Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?
– Madman12
Nov 28 '18 at 16:50
Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?
– Madman12
Nov 28 '18 at 16:50
A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).
– Felix
Nov 29 '18 at 13:59
A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).
– Felix
Nov 29 '18 at 13:59
Is performance really a problem for your program? How large are your text files?
– Felix
Nov 29 '18 at 14:00
Is performance really a problem for your program? How large are your text files?
– Felix
Nov 29 '18 at 14:00
add a comment |
If it is not a problem you can load all the words to remove in to a set using split
, then check each word before you write it to the output file.
Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " "
to separate each word from other words.
rm_word_file = open('removewords.txt', 'r')
remove_words = set(rm_word_file.read().split(" "))
rm_word_file.close()
files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:
for file in files:
with open (file) as complete_file:
for line in complete_file:
for word in line.split(" "):
if(word not in remove_words):
fout.write(line)
Something else to think about is, if there is punctuation in your text body, how you will handle that?
You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.
There's an indentation error and you should close "removewords.txt".
– Felix
Nov 28 '18 at 14:36
This is not working for me. The text repeats itself several times :(
– Madman12
Nov 28 '18 at 16:50
Can you describe the error/issue you are having in more depth?
– Adam Dadvar
Nov 28 '18 at 16:59
Each text piece - separated by a newline - is printed five times in the output file
– Madman12
Nov 28 '18 at 18:52
That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.
– Adam Dadvar
Nov 29 '18 at 11:31
add a comment |
If it is not a problem you can load all the words to remove in to a set using split
, then check each word before you write it to the output file.
Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " "
to separate each word from other words.
rm_word_file = open('removewords.txt', 'r')
remove_words = set(rm_word_file.read().split(" "))
rm_word_file.close()
files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:
for file in files:
with open (file) as complete_file:
for line in complete_file:
for word in line.split(" "):
if(word not in remove_words):
fout.write(line)
Something else to think about is, if there is punctuation in your text body, how you will handle that?
You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.
There's an indentation error and you should close "removewords.txt".
– Felix
Nov 28 '18 at 14:36
This is not working for me. The text repeats itself several times :(
– Madman12
Nov 28 '18 at 16:50
Can you describe the error/issue you are having in more depth?
– Adam Dadvar
Nov 28 '18 at 16:59
Each text piece - separated by a newline - is printed five times in the output file
– Madman12
Nov 28 '18 at 18:52
That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.
– Adam Dadvar
Nov 29 '18 at 11:31
add a comment |
If it is not a problem you can load all the words to remove in to a set using split
, then check each word before you write it to the output file.
Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " "
to separate each word from other words.
rm_word_file = open('removewords.txt', 'r')
remove_words = set(rm_word_file.read().split(" "))
rm_word_file.close()
files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:
for file in files:
with open (file) as complete_file:
for line in complete_file:
for word in line.split(" "):
if(word not in remove_words):
fout.write(line)
Something else to think about is, if there is punctuation in your text body, how you will handle that?
You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.
If it is not a problem you can load all the words to remove in to a set using split
, then check each word before you write it to the output file.
Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " "
to separate each word from other words.
rm_word_file = open('removewords.txt', 'r')
remove_words = set(rm_word_file.read().split(" "))
rm_word_file.close()
files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:
for file in files:
with open (file) as complete_file:
for line in complete_file:
for word in line.split(" "):
if(word not in remove_words):
fout.write(line)
Something else to think about is, if there is punctuation in your text body, how you will handle that?
You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.
edited Nov 28 '18 at 14:39
answered Nov 28 '18 at 14:33
Adam DadvarAdam Dadvar
1997
1997
There's an indentation error and you should close "removewords.txt".
– Felix
Nov 28 '18 at 14:36
This is not working for me. The text repeats itself several times :(
– Madman12
Nov 28 '18 at 16:50
Can you describe the error/issue you are having in more depth?
– Adam Dadvar
Nov 28 '18 at 16:59
Each text piece - separated by a newline - is printed five times in the output file
– Madman12
Nov 28 '18 at 18:52
That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.
– Adam Dadvar
Nov 29 '18 at 11:31
add a comment |
There's an indentation error and you should close "removewords.txt".
– Felix
Nov 28 '18 at 14:36
This is not working for me. The text repeats itself several times :(
– Madman12
Nov 28 '18 at 16:50
Can you describe the error/issue you are having in more depth?
– Adam Dadvar
Nov 28 '18 at 16:59
Each text piece - separated by a newline - is printed five times in the output file
– Madman12
Nov 28 '18 at 18:52
That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.
– Adam Dadvar
Nov 29 '18 at 11:31
There's an indentation error and you should close "removewords.txt".
– Felix
Nov 28 '18 at 14:36
There's an indentation error and you should close "removewords.txt".
– Felix
Nov 28 '18 at 14:36
This is not working for me. The text repeats itself several times :(
– Madman12
Nov 28 '18 at 16:50
This is not working for me. The text repeats itself several times :(
– Madman12
Nov 28 '18 at 16:50
Can you describe the error/issue you are having in more depth?
– Adam Dadvar
Nov 28 '18 at 16:59
Can you describe the error/issue you are having in more depth?
– Adam Dadvar
Nov 28 '18 at 16:59
Each text piece - separated by a newline - is printed five times in the output file
– Madman12
Nov 28 '18 at 18:52
Each text piece - separated by a newline - is printed five times in the output file
– Madman12
Nov 28 '18 at 18:52
That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.
– Adam Dadvar
Nov 29 '18 at 11:31
That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.
– Adam Dadvar
Nov 29 '18 at 11:31
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53521590%2fdelete-words-from-text-file-if-they-exist-in-another-textfile%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown