Delete words from text file if they exist in another textfile












-1















I have created one txt.file, consisting of five other text files (all text.txt). I also have a text file with words on each line (remove words.txt). I would like to remove the words from removewords.txt from alltext.txt, without creating a new textfile and without writing the words from removewords.txt manually.



I have thought about using sets, but is a but confused how to approach this?



My mergin of files looks like this:



files=["file1.txt", "file2.txt"...."file5.txt"]
with open("compare_out.txt", "w") as fout:

for file in files:
with open (file) as complete_file:
for line in complete_file:
fout.write(line)


Any suggestions? Thank you very much










share|improve this question



























    -1















    I have created one txt.file, consisting of five other text files (all text.txt). I also have a text file with words on each line (remove words.txt). I would like to remove the words from removewords.txt from alltext.txt, without creating a new textfile and without writing the words from removewords.txt manually.



    I have thought about using sets, but is a but confused how to approach this?



    My mergin of files looks like this:



    files=["file1.txt", "file2.txt"...."file5.txt"]
    with open("compare_out.txt", "w") as fout:

    for file in files:
    with open (file) as complete_file:
    for line in complete_file:
    fout.write(line)


    Any suggestions? Thank you very much










    share|improve this question

























      -1












      -1








      -1








      I have created one txt.file, consisting of five other text files (all text.txt). I also have a text file with words on each line (remove words.txt). I would like to remove the words from removewords.txt from alltext.txt, without creating a new textfile and without writing the words from removewords.txt manually.



      I have thought about using sets, but is a but confused how to approach this?



      My mergin of files looks like this:



      files=["file1.txt", "file2.txt"...."file5.txt"]
      with open("compare_out.txt", "w") as fout:

      for file in files:
      with open (file) as complete_file:
      for line in complete_file:
      fout.write(line)


      Any suggestions? Thank you very much










      share|improve this question














      I have created one txt.file, consisting of five other text files (all text.txt). I also have a text file with words on each line (remove words.txt). I would like to remove the words from removewords.txt from alltext.txt, without creating a new textfile and without writing the words from removewords.txt manually.



      I have thought about using sets, but is a but confused how to approach this?



      My mergin of files looks like this:



      files=["file1.txt", "file2.txt"...."file5.txt"]
      with open("compare_out.txt", "w") as fout:

      for file in files:
      with open (file) as complete_file:
      for line in complete_file:
      fout.write(line)


      Any suggestions? Thank you very much







      python






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 28 '18 at 14:21









      Madman12Madman12

      173




      173
























          2 Answers
          2






          active

          oldest

          votes


















          0














          I would do the following:




          1. read all words from "removewords.txt" into a list called remove_words

          2. read all words from "alltext.txt" into a list called all_words

          3. open the file "alltext.txt" in write mode ("w") and write content to it as follows:


            • for each word in all_words, check if that word is in the list remove_words. If it is not, write it to "alltext.txt"




          Are these steps detailed enough so that you can solve your problem?



          If not, comment below on what you are having problems with.






          share|improve this answer
























          • Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?

            – Madman12
            Nov 28 '18 at 16:50











          • A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).

            – Felix
            Nov 29 '18 at 13:59











          • Is performance really a problem for your program? How large are your text files?

            – Felix
            Nov 29 '18 at 14:00



















          0














          If it is not a problem you can load all the words to remove in to a set using split, then check each word before you write it to the output file.
          Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " " to separate each word from other words.



          rm_word_file = open('removewords.txt', 'r')
          remove_words = set(rm_word_file.read().split(" "))
          rm_word_file.close()
          files=["file1.txt", "file2.txt"...."file5.txt"]

          with open("compare_out.txt", "w") as fout:

          for file in files:
          with open (file) as complete_file:
          for line in complete_file:
          for word in line.split(" "):
          if(word not in remove_words):
          fout.write(line)


          Something else to think about is, if there is punctuation in your text body, how you will handle that?



          You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.






          share|improve this answer


























          • There's an indentation error and you should close "removewords.txt".

            – Felix
            Nov 28 '18 at 14:36











          • This is not working for me. The text repeats itself several times :(

            – Madman12
            Nov 28 '18 at 16:50











          • Can you describe the error/issue you are having in more depth?

            – Adam Dadvar
            Nov 28 '18 at 16:59











          • Each text piece - separated by a newline - is printed five times in the output file

            – Madman12
            Nov 28 '18 at 18:52











          • That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.

            – Adam Dadvar
            Nov 29 '18 at 11:31











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53521590%2fdelete-words-from-text-file-if-they-exist-in-another-textfile%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          I would do the following:




          1. read all words from "removewords.txt" into a list called remove_words

          2. read all words from "alltext.txt" into a list called all_words

          3. open the file "alltext.txt" in write mode ("w") and write content to it as follows:


            • for each word in all_words, check if that word is in the list remove_words. If it is not, write it to "alltext.txt"




          Are these steps detailed enough so that you can solve your problem?



          If not, comment below on what you are having problems with.






          share|improve this answer
























          • Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?

            – Madman12
            Nov 28 '18 at 16:50











          • A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).

            – Felix
            Nov 29 '18 at 13:59











          • Is performance really a problem for your program? How large are your text files?

            – Felix
            Nov 29 '18 at 14:00
















          0














          I would do the following:




          1. read all words from "removewords.txt" into a list called remove_words

          2. read all words from "alltext.txt" into a list called all_words

          3. open the file "alltext.txt" in write mode ("w") and write content to it as follows:


            • for each word in all_words, check if that word is in the list remove_words. If it is not, write it to "alltext.txt"




          Are these steps detailed enough so that you can solve your problem?



          If not, comment below on what you are having problems with.






          share|improve this answer
























          • Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?

            – Madman12
            Nov 28 '18 at 16:50











          • A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).

            – Felix
            Nov 29 '18 at 13:59











          • Is performance really a problem for your program? How large are your text files?

            – Felix
            Nov 29 '18 at 14:00














          0












          0








          0







          I would do the following:




          1. read all words from "removewords.txt" into a list called remove_words

          2. read all words from "alltext.txt" into a list called all_words

          3. open the file "alltext.txt" in write mode ("w") and write content to it as follows:


            • for each word in all_words, check if that word is in the list remove_words. If it is not, write it to "alltext.txt"




          Are these steps detailed enough so that you can solve your problem?



          If not, comment below on what you are having problems with.






          share|improve this answer













          I would do the following:




          1. read all words from "removewords.txt" into a list called remove_words

          2. read all words from "alltext.txt" into a list called all_words

          3. open the file "alltext.txt" in write mode ("w") and write content to it as follows:


            • for each word in all_words, check if that word is in the list remove_words. If it is not, write it to "alltext.txt"




          Are these steps detailed enough so that you can solve your problem?



          If not, comment below on what you are having problems with.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 28 '18 at 14:32









          FelixFelix

          3,2212930




          3,2212930













          • Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?

            – Madman12
            Nov 28 '18 at 16:50











          • A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).

            – Felix
            Nov 29 '18 at 13:59











          • Is performance really a problem for your program? How large are your text files?

            – Felix
            Nov 29 '18 at 14:00



















          • Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?

            – Madman12
            Nov 28 '18 at 16:50











          • A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).

            – Felix
            Nov 29 '18 at 13:59











          • Is performance really a problem for your program? How large are your text files?

            – Felix
            Nov 29 '18 at 14:00

















          Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?

          – Madman12
          Nov 28 '18 at 16:50





          Thank you for a great answer! But when I do this, I will manipulate the text more than necessary. Is there any way to only remove the words from removewords.txt and replace them with a space?

          – Madman12
          Nov 28 '18 at 16:50













          A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).

          – Felix
          Nov 29 '18 at 13:59





          A (text) file is basically just a sequence of characters on your disk. This sequence needs to be continuous, there may not be "holes" in it. If you want to delete some words in the middle of the file, you would have to "move" all following words to fill the gap. The easiest option is to just write the whole file again (as I've shown above).

          – Felix
          Nov 29 '18 at 13:59













          Is performance really a problem for your program? How large are your text files?

          – Felix
          Nov 29 '18 at 14:00





          Is performance really a problem for your program? How large are your text files?

          – Felix
          Nov 29 '18 at 14:00













          0














          If it is not a problem you can load all the words to remove in to a set using split, then check each word before you write it to the output file.
          Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " " to separate each word from other words.



          rm_word_file = open('removewords.txt', 'r')
          remove_words = set(rm_word_file.read().split(" "))
          rm_word_file.close()
          files=["file1.txt", "file2.txt"...."file5.txt"]

          with open("compare_out.txt", "w") as fout:

          for file in files:
          with open (file) as complete_file:
          for line in complete_file:
          for word in line.split(" "):
          if(word not in remove_words):
          fout.write(line)


          Something else to think about is, if there is punctuation in your text body, how you will handle that?



          You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.






          share|improve this answer


























          • There's an indentation error and you should close "removewords.txt".

            – Felix
            Nov 28 '18 at 14:36











          • This is not working for me. The text repeats itself several times :(

            – Madman12
            Nov 28 '18 at 16:50











          • Can you describe the error/issue you are having in more depth?

            – Adam Dadvar
            Nov 28 '18 at 16:59











          • Each text piece - separated by a newline - is printed five times in the output file

            – Madman12
            Nov 28 '18 at 18:52











          • That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.

            – Adam Dadvar
            Nov 29 '18 at 11:31
















          0














          If it is not a problem you can load all the words to remove in to a set using split, then check each word before you write it to the output file.
          Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " " to separate each word from other words.



          rm_word_file = open('removewords.txt', 'r')
          remove_words = set(rm_word_file.read().split(" "))
          rm_word_file.close()
          files=["file1.txt", "file2.txt"...."file5.txt"]

          with open("compare_out.txt", "w") as fout:

          for file in files:
          with open (file) as complete_file:
          for line in complete_file:
          for word in line.split(" "):
          if(word not in remove_words):
          fout.write(line)


          Something else to think about is, if there is punctuation in your text body, how you will handle that?



          You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.






          share|improve this answer


























          • There's an indentation error and you should close "removewords.txt".

            – Felix
            Nov 28 '18 at 14:36











          • This is not working for me. The text repeats itself several times :(

            – Madman12
            Nov 28 '18 at 16:50











          • Can you describe the error/issue you are having in more depth?

            – Adam Dadvar
            Nov 28 '18 at 16:59











          • Each text piece - separated by a newline - is printed five times in the output file

            – Madman12
            Nov 28 '18 at 18:52











          • That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.

            – Adam Dadvar
            Nov 29 '18 at 11:31














          0












          0








          0







          If it is not a problem you can load all the words to remove in to a set using split, then check each word before you write it to the output file.
          Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " " to separate each word from other words.



          rm_word_file = open('removewords.txt', 'r')
          remove_words = set(rm_word_file.read().split(" "))
          rm_word_file.close()
          files=["file1.txt", "file2.txt"...."file5.txt"]

          with open("compare_out.txt", "w") as fout:

          for file in files:
          with open (file) as complete_file:
          for line in complete_file:
          for word in line.split(" "):
          if(word not in remove_words):
          fout.write(line)


          Something else to think about is, if there is punctuation in your text body, how you will handle that?



          You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.






          share|improve this answer















          If it is not a problem you can load all the words to remove in to a set using split, then check each word before you write it to the output file.
          Split separates a string in to list elements based on a delimiting character - in the case of words we can use a space character " " to separate each word from other words.



          rm_word_file = open('removewords.txt', 'r')
          remove_words = set(rm_word_file.read().split(" "))
          rm_word_file.close()
          files=["file1.txt", "file2.txt"...."file5.txt"]

          with open("compare_out.txt", "w") as fout:

          for file in files:
          with open (file) as complete_file:
          for line in complete_file:
          for word in line.split(" "):
          if(word not in remove_words):
          fout.write(line)


          Something else to think about is, if there is punctuation in your text body, how you will handle that?



          You can just remove all punctuation, but then its and it's would be treated as the same word, which may not be the intended behaviour.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 28 '18 at 14:39

























          answered Nov 28 '18 at 14:33









          Adam DadvarAdam Dadvar

          1997




          1997













          • There's an indentation error and you should close "removewords.txt".

            – Felix
            Nov 28 '18 at 14:36











          • This is not working for me. The text repeats itself several times :(

            – Madman12
            Nov 28 '18 at 16:50











          • Can you describe the error/issue you are having in more depth?

            – Adam Dadvar
            Nov 28 '18 at 16:59











          • Each text piece - separated by a newline - is printed five times in the output file

            – Madman12
            Nov 28 '18 at 18:52











          • That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.

            – Adam Dadvar
            Nov 29 '18 at 11:31



















          • There's an indentation error and you should close "removewords.txt".

            – Felix
            Nov 28 '18 at 14:36











          • This is not working for me. The text repeats itself several times :(

            – Madman12
            Nov 28 '18 at 16:50











          • Can you describe the error/issue you are having in more depth?

            – Adam Dadvar
            Nov 28 '18 at 16:59











          • Each text piece - separated by a newline - is printed five times in the output file

            – Madman12
            Nov 28 '18 at 18:52











          • That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.

            – Adam Dadvar
            Nov 29 '18 at 11:31

















          There's an indentation error and you should close "removewords.txt".

          – Felix
          Nov 28 '18 at 14:36





          There's an indentation error and you should close "removewords.txt".

          – Felix
          Nov 28 '18 at 14:36













          This is not working for me. The text repeats itself several times :(

          – Madman12
          Nov 28 '18 at 16:50





          This is not working for me. The text repeats itself several times :(

          – Madman12
          Nov 28 '18 at 16:50













          Can you describe the error/issue you are having in more depth?

          – Adam Dadvar
          Nov 28 '18 at 16:59





          Can you describe the error/issue you are having in more depth?

          – Adam Dadvar
          Nov 28 '18 at 16:59













          Each text piece - separated by a newline - is printed five times in the output file

          – Madman12
          Nov 28 '18 at 18:52





          Each text piece - separated by a newline - is printed five times in the output file

          – Madman12
          Nov 28 '18 at 18:52













          That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.

          – Adam Dadvar
          Nov 29 '18 at 11:31





          That is probably because there are 5 files no? I am confused what you actually want from your question -> These 5 text files you want to combine them how? Because currently we are just reading them one at a time.

          – Adam Dadvar
          Nov 29 '18 at 11:31


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53521590%2fdelete-words-from-text-file-if-they-exist-in-another-textfile%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

          Calculate evaluation metrics using cross_val_predict sklearn

          Insert data from modal to MySQL (multiple modal on website)