Awk to update file based on match and condition in another
The below awk
will produce the tab-delimeted file1
with the difference between $3-$2
calulated for each line and printed in $6
. Before the awk
is executed only 5 fields exist.
What I am having trouble with updated each $2
value in file2
with the $7
value of file1
if the $1
value of file2
matches the $5
of file1
and $6
in file1
is not intron
. If the value of $5
is intron
then then the value of $7
in file1
is zero. So for example line 1 in file1
is intron
so that is equvilant to zero or skipped (those lines are not needed in the calculation).
It is possible that a $1
value in file2
may not exist in file1
and in this case the value of $2
in file2
is zero. Line3 infile2
is an example and is set to zero because it does not exist in file1
. Thank you:).
Awk w/ output
awk '
FNR==NR{ # process same line
b[$4]=$3-$2;
next # process next line
}
{
a[$5]+=($3-$2)
}
{
split($1, b, " "); print b[0], a[b[0]]
}' OFS="t" file1 file2
Output
-2135
-2135
-2222
-2351
-2351
-2414
File1 tab-delimited
chr5 86667863 86667879 RASA1 intron 16
chr5 86669977 86669995 RASA1 splicing 18
chr5 86670703 86670805 RASA1 exon 102
chr5 86679453 86679547 RASA1 intron 94
chr5 86679571 86679673 RASA1 exon 102
chr19 15088950 15088961 NOTCH2 intron 50
chr19 15288950 15288961 NOTCH3 intron 11
chr19 15308240 15308275 NOTCH3 exon 35
File2 space delimited
RASA1 2135
NOTCH2 0
GIMAP8 87
NOTCH3 129
FOXF2 0
PRB3 63
Desired out after file2 is updated
RASA1 222 `(102+102+18)`
NOTCH2 0
GIMAP8 0
NOTCH3 35 `(35)`
FOXF2 0
PRB3 0
Maybe adding a |
after the first awk
with:
awk 'FNR==NR { a[$1]=$7; next } { if(a[$5]){$1=a[$5] }; print }'
To update file2
awk
add a comment |
The below awk
will produce the tab-delimeted file1
with the difference between $3-$2
calulated for each line and printed in $6
. Before the awk
is executed only 5 fields exist.
What I am having trouble with updated each $2
value in file2
with the $7
value of file1
if the $1
value of file2
matches the $5
of file1
and $6
in file1
is not intron
. If the value of $5
is intron
then then the value of $7
in file1
is zero. So for example line 1 in file1
is intron
so that is equvilant to zero or skipped (those lines are not needed in the calculation).
It is possible that a $1
value in file2
may not exist in file1
and in this case the value of $2
in file2
is zero. Line3 infile2
is an example and is set to zero because it does not exist in file1
. Thank you:).
Awk w/ output
awk '
FNR==NR{ # process same line
b[$4]=$3-$2;
next # process next line
}
{
a[$5]+=($3-$2)
}
{
split($1, b, " "); print b[0], a[b[0]]
}' OFS="t" file1 file2
Output
-2135
-2135
-2222
-2351
-2351
-2414
File1 tab-delimited
chr5 86667863 86667879 RASA1 intron 16
chr5 86669977 86669995 RASA1 splicing 18
chr5 86670703 86670805 RASA1 exon 102
chr5 86679453 86679547 RASA1 intron 94
chr5 86679571 86679673 RASA1 exon 102
chr19 15088950 15088961 NOTCH2 intron 50
chr19 15288950 15288961 NOTCH3 intron 11
chr19 15308240 15308275 NOTCH3 exon 35
File2 space delimited
RASA1 2135
NOTCH2 0
GIMAP8 87
NOTCH3 129
FOXF2 0
PRB3 63
Desired out after file2 is updated
RASA1 222 `(102+102+18)`
NOTCH2 0
GIMAP8 0
NOTCH3 35 `(35)`
FOXF2 0
PRB3 0
Maybe adding a |
after the first awk
with:
awk 'FNR==NR { a[$1]=$7; next } { if(a[$5]){$1=a[$5] }; print }'
To update file2
awk
1
Instead of making the index of the arrayFNR
make it$4
and increment the value stored in the array at that index by the result of your$3-$2
calcuation. Likea[$5]+=($3-$2)
Then when processing the second file you can just pull the item from the array at the index that matches the first string in$1
like{split($1, b, " "); print b[0], a[b[0]]
(pardon if that syntax isn't 100%, but that should get in the ballpark.
– JNevill
Nov 23 '18 at 18:54
I made the edits and don't think I follow, but included the output in the original post. Thank you :).
– cm0728
Nov 23 '18 at 20:55
add a comment |
The below awk
will produce the tab-delimeted file1
with the difference between $3-$2
calulated for each line and printed in $6
. Before the awk
is executed only 5 fields exist.
What I am having trouble with updated each $2
value in file2
with the $7
value of file1
if the $1
value of file2
matches the $5
of file1
and $6
in file1
is not intron
. If the value of $5
is intron
then then the value of $7
in file1
is zero. So for example line 1 in file1
is intron
so that is equvilant to zero or skipped (those lines are not needed in the calculation).
It is possible that a $1
value in file2
may not exist in file1
and in this case the value of $2
in file2
is zero. Line3 infile2
is an example and is set to zero because it does not exist in file1
. Thank you:).
Awk w/ output
awk '
FNR==NR{ # process same line
b[$4]=$3-$2;
next # process next line
}
{
a[$5]+=($3-$2)
}
{
split($1, b, " "); print b[0], a[b[0]]
}' OFS="t" file1 file2
Output
-2135
-2135
-2222
-2351
-2351
-2414
File1 tab-delimited
chr5 86667863 86667879 RASA1 intron 16
chr5 86669977 86669995 RASA1 splicing 18
chr5 86670703 86670805 RASA1 exon 102
chr5 86679453 86679547 RASA1 intron 94
chr5 86679571 86679673 RASA1 exon 102
chr19 15088950 15088961 NOTCH2 intron 50
chr19 15288950 15288961 NOTCH3 intron 11
chr19 15308240 15308275 NOTCH3 exon 35
File2 space delimited
RASA1 2135
NOTCH2 0
GIMAP8 87
NOTCH3 129
FOXF2 0
PRB3 63
Desired out after file2 is updated
RASA1 222 `(102+102+18)`
NOTCH2 0
GIMAP8 0
NOTCH3 35 `(35)`
FOXF2 0
PRB3 0
Maybe adding a |
after the first awk
with:
awk 'FNR==NR { a[$1]=$7; next } { if(a[$5]){$1=a[$5] }; print }'
To update file2
awk
The below awk
will produce the tab-delimeted file1
with the difference between $3-$2
calulated for each line and printed in $6
. Before the awk
is executed only 5 fields exist.
What I am having trouble with updated each $2
value in file2
with the $7
value of file1
if the $1
value of file2
matches the $5
of file1
and $6
in file1
is not intron
. If the value of $5
is intron
then then the value of $7
in file1
is zero. So for example line 1 in file1
is intron
so that is equvilant to zero or skipped (those lines are not needed in the calculation).
It is possible that a $1
value in file2
may not exist in file1
and in this case the value of $2
in file2
is zero. Line3 infile2
is an example and is set to zero because it does not exist in file1
. Thank you:).
Awk w/ output
awk '
FNR==NR{ # process same line
b[$4]=$3-$2;
next # process next line
}
{
a[$5]+=($3-$2)
}
{
split($1, b, " "); print b[0], a[b[0]]
}' OFS="t" file1 file2
Output
-2135
-2135
-2222
-2351
-2351
-2414
File1 tab-delimited
chr5 86667863 86667879 RASA1 intron 16
chr5 86669977 86669995 RASA1 splicing 18
chr5 86670703 86670805 RASA1 exon 102
chr5 86679453 86679547 RASA1 intron 94
chr5 86679571 86679673 RASA1 exon 102
chr19 15088950 15088961 NOTCH2 intron 50
chr19 15288950 15288961 NOTCH3 intron 11
chr19 15308240 15308275 NOTCH3 exon 35
File2 space delimited
RASA1 2135
NOTCH2 0
GIMAP8 87
NOTCH3 129
FOXF2 0
PRB3 63
Desired out after file2 is updated
RASA1 222 `(102+102+18)`
NOTCH2 0
GIMAP8 0
NOTCH3 35 `(35)`
FOXF2 0
PRB3 0
Maybe adding a |
after the first awk
with:
awk 'FNR==NR { a[$1]=$7; next } { if(a[$5]){$1=a[$5] }; print }'
To update file2
awk
awk
edited Nov 23 '18 at 20:58
cm0728
asked Nov 23 '18 at 18:41
cm0728cm0728
1,486819
1,486819
1
Instead of making the index of the arrayFNR
make it$4
and increment the value stored in the array at that index by the result of your$3-$2
calcuation. Likea[$5]+=($3-$2)
Then when processing the second file you can just pull the item from the array at the index that matches the first string in$1
like{split($1, b, " "); print b[0], a[b[0]]
(pardon if that syntax isn't 100%, but that should get in the ballpark.
– JNevill
Nov 23 '18 at 18:54
I made the edits and don't think I follow, but included the output in the original post. Thank you :).
– cm0728
Nov 23 '18 at 20:55
add a comment |
1
Instead of making the index of the arrayFNR
make it$4
and increment the value stored in the array at that index by the result of your$3-$2
calcuation. Likea[$5]+=($3-$2)
Then when processing the second file you can just pull the item from the array at the index that matches the first string in$1
like{split($1, b, " "); print b[0], a[b[0]]
(pardon if that syntax isn't 100%, but that should get in the ballpark.
– JNevill
Nov 23 '18 at 18:54
I made the edits and don't think I follow, but included the output in the original post. Thank you :).
– cm0728
Nov 23 '18 at 20:55
1
1
Instead of making the index of the array
FNR
make it $4
and increment the value stored in the array at that index by the result of your $3-$2
calcuation. Like a[$5]+=($3-$2)
Then when processing the second file you can just pull the item from the array at the index that matches the first string in $1
like {split($1, b, " "); print b[0], a[b[0]]
(pardon if that syntax isn't 100%, but that should get in the ballpark.– JNevill
Nov 23 '18 at 18:54
Instead of making the index of the array
FNR
make it $4
and increment the value stored in the array at that index by the result of your $3-$2
calcuation. Like a[$5]+=($3-$2)
Then when processing the second file you can just pull the item from the array at the index that matches the first string in $1
like {split($1, b, " "); print b[0], a[b[0]]
(pardon if that syntax isn't 100%, but that should get in the ballpark.– JNevill
Nov 23 '18 at 18:54
I made the edits and don't think I follow, but included the output in the original post. Thank you :).
– cm0728
Nov 23 '18 at 20:55
I made the edits and don't think I follow, but included the output in the original post. Thank you :).
– cm0728
Nov 23 '18 at 20:55
add a comment |
2 Answers
2
active
oldest
votes
Could you please try following. It will provide you sequence of output in same order of Input_file's order.
awk '
FNR==NR{
if(!b[$1]++){
c[++count]=$1
}
a[$1]
next
}
($4 in a) && $5!="intron"{
a[$4]+=$NF
}
END{
for(i=1;i<=count;i++){
print c[i],a[c[i]]?a[c[i]]:0
}
}' Input_file2 Input_file1
Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1
-----> Input_file2 FS="t" Input_file1
. To get output as TAB delimited either append above code's output to | column -t
command or set OFS="t"
near to FS="t"
too.
Output will be as follows.
RASA1 222
NOTCH2 0
GIMAP8 0
NOTCH3 35
FOXF2 0
PRB3 0
add a comment |
if I understood it correctly, this should do what you expect
$ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
{$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated
IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
– RavinderSingh13
Nov 23 '18 at 23:48
1
Yes, thanks. Fixed to$5
– karakfa
Nov 23 '18 at 23:59
Thank you both very much :).
– cm0728
Nov 26 '18 at 13:24
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451595%2fawk-to-update-file-based-on-match-and-condition-in-another%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Could you please try following. It will provide you sequence of output in same order of Input_file's order.
awk '
FNR==NR{
if(!b[$1]++){
c[++count]=$1
}
a[$1]
next
}
($4 in a) && $5!="intron"{
a[$4]+=$NF
}
END{
for(i=1;i<=count;i++){
print c[i],a[c[i]]?a[c[i]]:0
}
}' Input_file2 Input_file1
Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1
-----> Input_file2 FS="t" Input_file1
. To get output as TAB delimited either append above code's output to | column -t
command or set OFS="t"
near to FS="t"
too.
Output will be as follows.
RASA1 222
NOTCH2 0
GIMAP8 0
NOTCH3 35
FOXF2 0
PRB3 0
add a comment |
Could you please try following. It will provide you sequence of output in same order of Input_file's order.
awk '
FNR==NR{
if(!b[$1]++){
c[++count]=$1
}
a[$1]
next
}
($4 in a) && $5!="intron"{
a[$4]+=$NF
}
END{
for(i=1;i<=count;i++){
print c[i],a[c[i]]?a[c[i]]:0
}
}' Input_file2 Input_file1
Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1
-----> Input_file2 FS="t" Input_file1
. To get output as TAB delimited either append above code's output to | column -t
command or set OFS="t"
near to FS="t"
too.
Output will be as follows.
RASA1 222
NOTCH2 0
GIMAP8 0
NOTCH3 35
FOXF2 0
PRB3 0
add a comment |
Could you please try following. It will provide you sequence of output in same order of Input_file's order.
awk '
FNR==NR{
if(!b[$1]++){
c[++count]=$1
}
a[$1]
next
}
($4 in a) && $5!="intron"{
a[$4]+=$NF
}
END{
for(i=1;i<=count;i++){
print c[i],a[c[i]]?a[c[i]]:0
}
}' Input_file2 Input_file1
Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1
-----> Input_file2 FS="t" Input_file1
. To get output as TAB delimited either append above code's output to | column -t
command or set OFS="t"
near to FS="t"
too.
Output will be as follows.
RASA1 222
NOTCH2 0
GIMAP8 0
NOTCH3 35
FOXF2 0
PRB3 0
Could you please try following. It will provide you sequence of output in same order of Input_file's order.
awk '
FNR==NR{
if(!b[$1]++){
c[++count]=$1
}
a[$1]
next
}
($4 in a) && $5!="intron"{
a[$4]+=$NF
}
END{
for(i=1;i<=count;i++){
print c[i],a[c[i]]?a[c[i]]:0
}
}' Input_file2 Input_file1
Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1
-----> Input_file2 FS="t" Input_file1
. To get output as TAB delimited either append above code's output to | column -t
command or set OFS="t"
near to FS="t"
too.
Output will be as follows.
RASA1 222
NOTCH2 0
GIMAP8 0
NOTCH3 35
FOXF2 0
PRB3 0
answered Nov 23 '18 at 23:44
RavinderSingh13RavinderSingh13
25.7k41438
25.7k41438
add a comment |
add a comment |
if I understood it correctly, this should do what you expect
$ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
{$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated
IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
– RavinderSingh13
Nov 23 '18 at 23:48
1
Yes, thanks. Fixed to$5
– karakfa
Nov 23 '18 at 23:59
Thank you both very much :).
– cm0728
Nov 26 '18 at 13:24
add a comment |
if I understood it correctly, this should do what you expect
$ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
{$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated
IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
– RavinderSingh13
Nov 23 '18 at 23:48
1
Yes, thanks. Fixed to$5
– karakfa
Nov 23 '18 at 23:59
Thank you both very much :).
– cm0728
Nov 26 '18 at 13:24
add a comment |
if I understood it correctly, this should do what you expect
$ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
{$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated
if I understood it correctly, this should do what you expect
$ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
{$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated
edited Nov 23 '18 at 23:58
answered Nov 23 '18 at 23:24
karakfakarakfa
48.2k52738
48.2k52738
IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
– RavinderSingh13
Nov 23 '18 at 23:48
1
Yes, thanks. Fixed to$5
– karakfa
Nov 23 '18 at 23:59
Thank you both very much :).
– cm0728
Nov 26 '18 at 13:24
add a comment |
IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
– RavinderSingh13
Nov 23 '18 at 23:48
1
Yes, thanks. Fixed to$5
– karakfa
Nov 23 '18 at 23:59
Thank you both very much :).
– cm0728
Nov 26 '18 at 13:24
IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
– RavinderSingh13
Nov 23 '18 at 23:48
IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
– RavinderSingh13
Nov 23 '18 at 23:48
1
1
Yes, thanks. Fixed to
$5
– karakfa
Nov 23 '18 at 23:59
Yes, thanks. Fixed to
$5
– karakfa
Nov 23 '18 at 23:59
Thank you both very much :).
– cm0728
Nov 26 '18 at 13:24
Thank you both very much :).
– cm0728
Nov 26 '18 at 13:24
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451595%2fawk-to-update-file-based-on-match-and-condition-in-another%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Instead of making the index of the array
FNR
make it$4
and increment the value stored in the array at that index by the result of your$3-$2
calcuation. Likea[$5]+=($3-$2)
Then when processing the second file you can just pull the item from the array at the index that matches the first string in$1
like{split($1, b, " "); print b[0], a[b[0]]
(pardon if that syntax isn't 100%, but that should get in the ballpark.– JNevill
Nov 23 '18 at 18:54
I made the edits and don't think I follow, but included the output in the original post. Thank you :).
– cm0728
Nov 23 '18 at 20:55