What is faster/better practice between a for loop for greping a file & greping a file with a file query?
I used to have a script like the following
for i in $(cat list.txt)
do
grep $i sales.txt
done
Where cat list.txt
tomatoes
peppers
onions
And cat sales.txt
Price Products
$8.88 bread
$6.75 tomatoes
$3.34 fish
$5.57 peppers
$0.95 beans
$4.56 onions
I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:
grep -f list.txt sales.txt
Is this last way of doing it really better than using a for loop? At first I thought it was, but then I realized it is probably the same since grep has to read the query file each time it greps a different line in the target file. Does anyone know if its actually better and why? If its better somehow I'm probably missing something about how grep processes this task but I can't figure it out.
bash shell loops grep text-processing
add a comment |
I used to have a script like the following
for i in $(cat list.txt)
do
grep $i sales.txt
done
Where cat list.txt
tomatoes
peppers
onions
And cat sales.txt
Price Products
$8.88 bread
$6.75 tomatoes
$3.34 fish
$5.57 peppers
$0.95 beans
$4.56 onions
I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:
grep -f list.txt sales.txt
Is this last way of doing it really better than using a for loop? At first I thought it was, but then I realized it is probably the same since grep has to read the query file each time it greps a different line in the target file. Does anyone know if its actually better and why? If its better somehow I'm probably missing something about how grep processes this task but I can't figure it out.
bash shell loops grep text-processing
It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario wheregrep -f list.txt sales.txt
wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns inlist.txt
and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55
add a comment |
I used to have a script like the following
for i in $(cat list.txt)
do
grep $i sales.txt
done
Where cat list.txt
tomatoes
peppers
onions
And cat sales.txt
Price Products
$8.88 bread
$6.75 tomatoes
$3.34 fish
$5.57 peppers
$0.95 beans
$4.56 onions
I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:
grep -f list.txt sales.txt
Is this last way of doing it really better than using a for loop? At first I thought it was, but then I realized it is probably the same since grep has to read the query file each time it greps a different line in the target file. Does anyone know if its actually better and why? If its better somehow I'm probably missing something about how grep processes this task but I can't figure it out.
bash shell loops grep text-processing
I used to have a script like the following
for i in $(cat list.txt)
do
grep $i sales.txt
done
Where cat list.txt
tomatoes
peppers
onions
And cat sales.txt
Price Products
$8.88 bread
$6.75 tomatoes
$3.34 fish
$5.57 peppers
$0.95 beans
$4.56 onions
I am a beginner in BASH/SHELL and after reading posts like Why is using a shell loop to process text considered bad practice? I changed the previous script to the following:
grep -f list.txt sales.txt
Is this last way of doing it really better than using a for loop? At first I thought it was, but then I realized it is probably the same since grep has to read the query file each time it greps a different line in the target file. Does anyone know if its actually better and why? If its better somehow I'm probably missing something about how grep processes this task but I can't figure it out.
bash shell loops grep text-processing
bash shell loops grep text-processing
edited Nov 23 '18 at 20:37
MikeKatz45
asked Nov 23 '18 at 19:56
MikeKatz45MikeKatz45
1718
1718
It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario wheregrep -f list.txt sales.txt
wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns inlist.txt
and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55
add a comment |
It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario wheregrep -f list.txt sales.txt
wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns inlist.txt
and grepping, then maybe a loop depending on what that was... maybe...
– JNevill
Nov 23 '18 at 20:55
It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where
grep -f list.txt sales.txt
wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt
and grepping, then maybe a loop depending on what that was... maybe...– JNevill
Nov 23 '18 at 20:55
It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where
grep -f list.txt sales.txt
wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns in list.txt
and grepping, then maybe a loop depending on what that was... maybe...– JNevill
Nov 23 '18 at 20:55
add a comment |
2 Answers
2
active
oldest
votes
Expanding on my comment...
You can download the source for grep via git with:
git clone https://git.savannah.gnu.org/git/grep.git
You can see at line 96 of src/grep.c a comment:
/* A list of lineno,filename pairs corresponding to -f FILENAME
arguments. Since we store the concatenation of all patterns in
a single array, KEYS, be they from the command line via "-e PAT"
or read from one or more -f-specified FILENAMES. Given this
invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
x, y and z are just place-holders for shell-generated names. */
Which is about all the clue we need to see that the patterns being searched whether they come in through -e
or through -f
with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.
Also, as I mentioned in my comment, the grep -f list.txt sales.txt
is easier to read, easier to maintain, and only a single program (grep
) has to be invoked.
The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24
This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14
add a comment |
Your second version is better because:
- It only requires a single pass over the file (it does not need multiple passes like you think)
- It has no globbing and spacing bugs (your first attempt behaves poorly for
green beans
or/*/*/*/*
)
It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53452313%2fwhat-is-faster-better-practice-between-a-for-loop-for-greping-a-file-greping-a%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Expanding on my comment...
You can download the source for grep via git with:
git clone https://git.savannah.gnu.org/git/grep.git
You can see at line 96 of src/grep.c a comment:
/* A list of lineno,filename pairs corresponding to -f FILENAME
arguments. Since we store the concatenation of all patterns in
a single array, KEYS, be they from the command line via "-e PAT"
or read from one or more -f-specified FILENAMES. Given this
invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
x, y and z are just place-holders for shell-generated names. */
Which is about all the clue we need to see that the patterns being searched whether they come in through -e
or through -f
with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.
Also, as I mentioned in my comment, the grep -f list.txt sales.txt
is easier to read, easier to maintain, and only a single program (grep
) has to be invoked.
The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24
This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14
add a comment |
Expanding on my comment...
You can download the source for grep via git with:
git clone https://git.savannah.gnu.org/git/grep.git
You can see at line 96 of src/grep.c a comment:
/* A list of lineno,filename pairs corresponding to -f FILENAME
arguments. Since we store the concatenation of all patterns in
a single array, KEYS, be they from the command line via "-e PAT"
or read from one or more -f-specified FILENAMES. Given this
invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
x, y and z are just place-holders for shell-generated names. */
Which is about all the clue we need to see that the patterns being searched whether they come in through -e
or through -f
with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.
Also, as I mentioned in my comment, the grep -f list.txt sales.txt
is easier to read, easier to maintain, and only a single program (grep
) has to be invoked.
The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24
This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14
add a comment |
Expanding on my comment...
You can download the source for grep via git with:
git clone https://git.savannah.gnu.org/git/grep.git
You can see at line 96 of src/grep.c a comment:
/* A list of lineno,filename pairs corresponding to -f FILENAME
arguments. Since we store the concatenation of all patterns in
a single array, KEYS, be they from the command line via "-e PAT"
or read from one or more -f-specified FILENAMES. Given this
invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
x, y and z are just place-holders for shell-generated names. */
Which is about all the clue we need to see that the patterns being searched whether they come in through -e
or through -f
with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.
Also, as I mentioned in my comment, the grep -f list.txt sales.txt
is easier to read, easier to maintain, and only a single program (grep
) has to be invoked.
Expanding on my comment...
You can download the source for grep via git with:
git clone https://git.savannah.gnu.org/git/grep.git
You can see at line 96 of src/grep.c a comment:
/* A list of lineno,filename pairs corresponding to -f FILENAME
arguments. Since we store the concatenation of all patterns in
a single array, KEYS, be they from the command line via "-e PAT"
or read from one or more -f-specified FILENAMES. Given this
invocation, grep -f <(seq 5) -f <(seq 2) -f <(seq 3) FILE, there
will be three entries in LF_PAIR: {1, x} {6, y} {8, z}, where
x, y and z are just place-holders for shell-generated names. */
Which is about all the clue we need to see that the patterns being searched whether they come in through -e
or through -f
with a file are dumped into an array. That array is then the source of the search. moving through that array in C is going to be faster than your shell looping through a file. So this alone will win the speed race.
Also, as I mentioned in my comment, the grep -f list.txt sales.txt
is easier to read, easier to maintain, and only a single program (grep
) has to be invoked.
answered Nov 23 '18 at 21:02
JNevillJNevill
31.4k31544
31.4k31544
The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24
This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14
add a comment |
The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24
This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14
The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24
The time saving us more likely to come from doing a single execution with a single file pass, not due to C iterating a small array faster than bash
– that other guy
Nov 23 '18 at 21:24
This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14
This is pretty much the explanation I was looking for. I didn't consider that grep works in C and that this would give it an edge over a pure bash search through a file. This makes sense, thank you.
– MikeKatz45
Nov 23 '18 at 23:14
add a comment |
Your second version is better because:
- It only requires a single pass over the file (it does not need multiple passes like you think)
- It has no globbing and spacing bugs (your first attempt behaves poorly for
green beans
or/*/*/*/*
)
It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).
add a comment |
Your second version is better because:
- It only requires a single pass over the file (it does not need multiple passes like you think)
- It has no globbing and spacing bugs (your first attempt behaves poorly for
green beans
or/*/*/*/*
)
It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).
add a comment |
Your second version is better because:
- It only requires a single pass over the file (it does not need multiple passes like you think)
- It has no globbing and spacing bugs (your first attempt behaves poorly for
green beans
or/*/*/*/*
)
It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).
Your second version is better because:
- It only requires a single pass over the file (it does not need multiple passes like you think)
- It has no globbing and spacing bugs (your first attempt behaves poorly for
green beans
or/*/*/*/*
)
It's totally fine to read files purely in shell code when 1. you do it correctly and 2. the overhead is negligible, but neither really applies to your first example (except for the fact that the files are currently small).
answered Nov 23 '18 at 21:07
that other guythat other guy
72k885123
72k885123
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53452313%2fwhat-is-faster-better-practice-between-a-for-loop-for-greping-a-file-greping-a%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It's easier to read, the loop logic is written into the program itself so it's almost definitely faster, only a single program has to be called... I can't think of a scenario where
grep -f list.txt sales.txt
wouldn't be considered the "better" option. I suppose if you needed some intermediate processing between switching through your patterns inlist.txt
and grepping, then maybe a loop depending on what that was... maybe...– JNevill
Nov 23 '18 at 20:55