What is faster/better practice between a for loop for greping a file & greping a file with a file query?

I used to have a script like the following

for i in $(cat list.txt)

do

  grep $i sales.txt

done

Where cat list.txt

tomatoes

peppers

onions

And cat sales.txt

Price Products

$8.88 bread

$6.75 tomatoes

$3.34 fish

$5.57 peppers

$0.95 beans

$4.56 onions

I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:

grep -f list.txt sales.txt

Is this last way of doing it really better than using a for loop? At first I thought it was, but then I realized it is probably the same since grep has to read the query file each time it greps a different line in the target file. Does anyone know if its actually better and why? If its better somehow I'm probably missing something about how grep processes this task but I can't figure it out.

edited Nov 23 '18 at 20:37

asked Nov 23 '18 at 19:56

MikeKatz45

1718

It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55

add a comment |

I used to have a script like the following

for i in $(cat list.txt)

do

  grep $i sales.txt

done

Where cat list.txt

tomatoes

peppers

onions

And cat sales.txt

Price Products

$8.88 bread

$6.75 tomatoes

$3.34 fish

$5.57 peppers

$0.95 beans

$4.56 onions

I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:

grep -f list.txt sales.txt

edited Nov 23 '18 at 20:37

asked Nov 23 '18 at 19:56

MikeKatz45

1718

It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55

add a comment |

I used to have a script like the following

for i in $(cat list.txt)

do

  grep $i sales.txt

done

Where cat list.txt

tomatoes

peppers

onions

And cat sales.txt

Price Products

$8.88 bread

$6.75 tomatoes

$3.34 fish

$5.57 peppers

$0.95 beans

$4.56 onions

I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:

grep -f list.txt sales.txt

edited Nov 23 '18 at 20:37

asked Nov 23 '18 at 19:56

MikeKatz45

1718

I used to have a script like the following

for i in $(cat list.txt)

do

  grep $i sales.txt

done

Where cat list.txt

tomatoes

peppers

onions

And cat sales.txt

Price Products

$8.88 bread

$6.75 tomatoes

$3.34 fish

$5.57 peppers

$0.95 beans

$4.56 onions

I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:

grep -f list.txt sales.txt

bash shell loops grep text-processing

edited Nov 23 '18 at 20:37

asked Nov 23 '18 at 19:56

MikeKatz45

1718

edited Nov 23 '18 at 20:37

asked Nov 23 '18 at 19:56

MikeKatz45

1718

edited Nov 23 '18 at 20:37

asked Nov 23 '18 at 19:56

MikeKatz45

1718

asked Nov 23 '18 at 19:56

MikeKatz45

1718

asked Nov 23 '18 at 19:56

MikeKatz45

1718

It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55

add a comment |

It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55

It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where grep -f list.txt sales.txt wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55

add a comment |

2 Answers
2

active

oldest

votes

Expanding on my comment...

You can download the source for grep via git with:

 git clone https://git.savannah.gnu.org/git/grep.git

You can see at line 96 of src/grep.c a comment:

/* A list of lineno,filename pairs corresponding to -f FILENAME

   arguments. Since we store the concatenation of all patterns in

   a single array, KEYS, be they from the command line via "-e PAT"

   or read from one or more -f-specified FILENAMES.  Given this

   invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there

   will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where

   x, y and z are just place-holders for shell-generated names.  */

Which is about all the clue we need to see that the patterns being searched whether they come in through -e or through -f with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.

Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.

answered Nov 23 '18 at 21:02

JNevill

31.4k31544

The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24

This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14

add a comment |

Your second version is better because:

It only requires a single pass over the file (it does not need multiple passes like you think)

It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)

It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).

answered Nov 23 '18 at 21:07

that other guy

72k885123

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53452313%2fwhat-is-faster-better-practice-between-a-for-loop-for-greping-a-file-greping-a%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Expanding on my comment...

You can download the source for grep via git with:

 git clone https://git.savannah.gnu.org/git/grep.git

You can see at line 96 of src/grep.c a comment:

/* A list of lineno,filename pairs corresponding to -f FILENAME

   arguments. Since we store the concatenation of all patterns in

   a single array, KEYS, be they from the command line via "-e PAT"

   or read from one or more -f-specified FILENAMES.  Given this

   invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there

   will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where

   x, y and z are just place-holders for shell-generated names.  */

Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.

answered Nov 23 '18 at 21:02

JNevill

31.4k31544

The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24

This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14

add a comment |

Expanding on my comment...

You can download the source for grep via git with:

 git clone https://git.savannah.gnu.org/git/grep.git

You can see at line 96 of src/grep.c a comment:

/* A list of lineno,filename pairs corresponding to -f FILENAME

   arguments. Since we store the concatenation of all patterns in

   a single array, KEYS, be they from the command line via "-e PAT"

   or read from one or more -f-specified FILENAMES.  Given this

   invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there

   will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where

   x, y and z are just place-holders for shell-generated names.  */

Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.

answered Nov 23 '18 at 21:02

JNevill

31.4k31544

The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24

This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14

add a comment |

Expanding on my comment...

You can download the source for grep via git with:

 git clone https://git.savannah.gnu.org/git/grep.git

You can see at line 96 of src/grep.c a comment:

/* A list of lineno,filename pairs corresponding to -f FILENAME

   arguments. Since we store the concatenation of all patterns in

   a single array, KEYS, be they from the command line via "-e PAT"

   or read from one or more -f-specified FILENAMES.  Given this

   invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there

   will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where

   x, y and z are just place-holders for shell-generated names.  */

Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.

answered Nov 23 '18 at 21:02

JNevill

31.4k31544

Expanding on my comment...

You can download the source for grep via git with:

 git clone https://git.savannah.gnu.org/git/grep.git

You can see at line 96 of src/grep.c a comment:

/* A list of lineno,filename pairs corresponding to -f FILENAME

   arguments. Since we store the concatenation of all patterns in

   a single array, KEYS, be they from the command line via "-e PAT"

   or read from one or more -f-specified FILENAMES.  Given this

   invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there

   will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where

   x, y and z are just place-holders for shell-generated names.  */

Also, as I mentioned in my comment, the grep -f list.txt sales.txt is easier to read, easier to maintain, and only a single program (grep) has to be invoked.

answered Nov 23 '18 at 21:02

JNevill

31.4k31544

answered Nov 23 '18 at 21:02

JNevill

31.4k31544

answered Nov 23 '18 at 21:02

JNevill

31.4k31544

answered Nov 23 '18 at 21:02

JNevill

31.4k31544

The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24

This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14

add a comment |

The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24

This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14

The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24

This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14

add a comment |

Your second version is better because:

It only requires a single pass over the file (it does not need multiple passes like you think)

It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)

answered Nov 23 '18 at 21:07

that other guy

72k885123

add a comment |

Your second version is better because:

It only requires a single pass over the file (it does not need multiple passes like you think)

It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)

answered Nov 23 '18 at 21:07

that other guy

72k885123

add a comment |

Your second version is better because:

It only requires a single pass over the file (it does not need multiple passes like you think)

It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)

answered Nov 23 '18 at 21:07

that other guy

72k885123

Your second version is better because:

It only requires a single pass over the file (it does not need multiple passes like you think)

It has no globbing and spacing bugs (your first attempt behaves poorly for green beans or /*/*/*/*)

answered Nov 23 '18 at 21:07

that other guy

72k885123

answered Nov 23 '18 at 21:07

that other guy

72k885123

answered Nov 23 '18 at 21:07

that other guy

72k885123

answered Nov 23 '18 at 21:07

that other guy

72k885123

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl