What is faster/better practice between a for loop for greping a file & greping a file with a file query?












0














I used to have a script like the following



for i in $(cat list.txt)
do
grep $i sales.txt
done


Where cat list.txt



tomatoes
peppers
onions


And cat sales.txt



Price Products
$8.88 bread
$6.75 tomatoes
$3.34 fish
$5.57 peppers
$0.95 beans
$4.56 onions


I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:



grep -f list.txt sales.txt


Is this last way of doing it really better than using a for loop? At first I thought it was, but then I realized it is probably the same since grep has to read the query file each time it greps a different line in the target file. Does anyone know if its actually better and why? If its better somehow I'm probably missing something about how grep processes this task but I can't figure it out.










share|improve this question
























  • It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
    – JNevill
    Nov 23 '18 at 20:55
















0














I used to have a script like the following



for i in $(cat list.txt)
do
grep $i sales.txt
done


Where cat list.txt



tomatoes
peppers
onions


And cat sales.txt



Price Products
$8.88 bread
$6.75 tomatoes
$3.34 fish
$5.57 peppers
$0.95 beans
$4.56 onions


I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:



grep -f list.txt sales.txt


Is this last way of doing it really better than using a for loop? At first I thought it was, but then I realized it is probably the same since grep has to read the query file each time it greps a different line in the target file. Does anyone know if its actually better and why? If its better somehow I'm probably missing something about how grep processes this task but I can't figure it out.










share|improve this question
























  • It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
    – JNevill
    Nov 23 '18 at 20:55














0












0








0







I used to have a script like the following



for i in $(cat list.txt)
do
grep $i sales.txt
done


Where cat list.txt



tomatoes
peppers
onions


And cat sales.txt



Price Products
$8.88 bread
$6.75 tomatoes
$3.34 fish
$5.57 peppers
$0.95 beans
$4.56 onions


I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:



grep -f list.txt sales.txt


Is this last way of doing it really better than using a for loop? At first I thought it was, but then I realized it is probably the same since grep has to read the query file each time it greps a different line in the target file. Does anyone know if its actually better and why? If its better somehow I'm probably missing something about how grep processes this task but I can't figure it out.










share|improve this question















I used to have a script like the following



for i in $(cat list.txt)
do
grep $i sales.txt
done


Where cat list.txt



tomatoes
peppers
onions


And cat sales.txt



Price Products
$8.88 bread
$6.75 tomatoes
$3.34 fish
$5.57 peppers
$0.95 beans
$4.56 onions


I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:



grep -f list.txt sales.txt


Is this last way of doing it really better than using a for loop? At first I thought it was, but then I realized it is probably the same since grep has to read the query file each time it greps a different line in the target file. Does anyone know if its actually better and why? If its better somehow I'm probably missing something about how grep processes this task but I can't figure it out.







bash shell loops grep text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 20:37







MikeKatz45

















asked Nov 23 '18 at 19:56









MikeKatz45MikeKatz45

1718




1718












  • It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
    – JNevill
    Nov 23 '18 at 20:55


















  • It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
    – JNevill
    Nov 23 '18 at 20:55
















It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55




It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55












2 Answers
2






active

oldest

votes


















1














Expanding on my comment...



You can download the source for grep via git with:



 git clone https://git.savannah.gnu.org/git/grep.git


You can see at line 96 of src/grep.c a comment:



/* A list of lineno,filename pairs corresponding to -f FILENAME
arguments. Since we store the concatenation of all patterns in
a single array, KEYS, be they from the command line via "-e PAT"
or read from one or more -f-specified FILENAMES. Given this
invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
x, y and z are just place-holders for shell-generated names. */


Which is about all the clue we need to see that the patterns being searched whether they come in through -e or through -f with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.



Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.






share|improve this answer





















  • The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
    – that other guy
    Nov 23 '18 at 21:24










  • This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
    – MikeKatz45
    Nov 23 '18 at 23:14



















1














Your second version is better because:




  1. It only requires a single pass over the file (it does not need multiple passes like you think)

  2. It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)


It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53452313%2fwhat-is-faster-better-practice-between-a-for-loop-for-greping-a-file-greping-a%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Expanding on my comment...



    You can download the source for grep via git with:



     git clone https://git.savannah.gnu.org/git/grep.git


    You can see at line 96 of src/grep.c a comment:



    /* A list of lineno,filename pairs corresponding to -f FILENAME
    arguments. Since we store the concatenation of all patterns in
    a single array, KEYS, be they from the command line via "-e PAT"
    or read from one or more -f-specified FILENAMES. Given this
    invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
    will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
    x, y and z are just place-holders for shell-generated names. */


    Which is about all the clue we need to see that the patterns being searched whether they come in through -e or through -f with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.



    Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.






    share|improve this answer





















    • The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
      – that other guy
      Nov 23 '18 at 21:24










    • This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
      – MikeKatz45
      Nov 23 '18 at 23:14
















    1














    Expanding on my comment...



    You can download the source for grep via git with:



     git clone https://git.savannah.gnu.org/git/grep.git


    You can see at line 96 of src/grep.c a comment:



    /* A list of lineno,filename pairs corresponding to -f FILENAME
    arguments. Since we store the concatenation of all patterns in
    a single array, KEYS, be they from the command line via "-e PAT"
    or read from one or more -f-specified FILENAMES. Given this
    invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
    will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
    x, y and z are just place-holders for shell-generated names. */


    Which is about all the clue we need to see that the patterns being searched whether they come in through -e or through -f with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.



    Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.






    share|improve this answer





















    • The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
      – that other guy
      Nov 23 '18 at 21:24










    • This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
      – MikeKatz45
      Nov 23 '18 at 23:14














    1












    1








    1






    Expanding on my comment...



    You can download the source for grep via git with:



     git clone https://git.savannah.gnu.org/git/grep.git


    You can see at line 96 of src/grep.c a comment:



    /* A list of lineno,filename pairs corresponding to -f FILENAME
    arguments. Since we store the concatenation of all patterns in
    a single array, KEYS, be they from the command line via "-e PAT"
    or read from one or more -f-specified FILENAMES. Given this
    invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
    will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
    x, y and z are just place-holders for shell-generated names. */


    Which is about all the clue we need to see that the patterns being searched whether they come in through -e or through -f with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.



    Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.






    share|improve this answer












    Expanding on my comment...



    You can download the source for grep via git with:



     git clone https://git.savannah.gnu.org/git/grep.git


    You can see at line 96 of src/grep.c a comment:



    /* A list of lineno,filename pairs corresponding to -f FILENAME
    arguments. Since we store the concatenation of all patterns in
    a single array, KEYS, be they from the command line via "-e PAT"
    or read from one or more -f-specified FILENAMES. Given this
    invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
    will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
    x, y and z are just place-holders for shell-generated names. */


    Which is about all the clue we need to see that the patterns being searched whether they come in through -e or through -f with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.



    Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 23 '18 at 21:02









    JNevillJNevill

    31.4k31544




    31.4k31544












    • The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
      – that other guy
      Nov 23 '18 at 21:24










    • This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
      – MikeKatz45
      Nov 23 '18 at 23:14


















    • The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
      – that other guy
      Nov 23 '18 at 21:24










    • This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
      – MikeKatz45
      Nov 23 '18 at 23:14
















    The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
    – that other guy
    Nov 23 '18 at 21:24




    The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
    – that other guy
    Nov 23 '18 at 21:24












    This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
    – MikeKatz45
    Nov 23 '18 at 23:14




    This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
    – MikeKatz45
    Nov 23 '18 at 23:14













    1














    Your second version is better because:




    1. It only requires a single pass over the file (it does not need multiple passes like you think)

    2. It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)


    It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).






    share|improve this answer


























      1














      Your second version is better because:




      1. It only requires a single pass over the file (it does not need multiple passes like you think)

      2. It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)


      It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).






      share|improve this answer
























        1












        1








        1






        Your second version is better because:




        1. It only requires a single pass over the file (it does not need multiple passes like you think)

        2. It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)


        It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).






        share|improve this answer












        Your second version is better because:




        1. It only requires a single pass over the file (it does not need multiple passes like you think)

        2. It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)


        It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 '18 at 21:07









        that other guythat other guy

        72k885123




        72k885123






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53452313%2fwhat-is-faster-better-practice-between-a-for-loop-for-greping-a-file-greping-a%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)