Awk to update file based on match and condition in another












0














The below awk will produce the tab-delimeted file1 with the difference between $3-$2 calulated for each line and printed in $6. Before the awk is executed only 5 fields exist.



What I am having trouble with updated each $2 value in file2 with the $7 value of file1 if the $1 value of file2 matches the $5 of file1 and $6 in file1 is not intron. If the value of $5 is intron then then the value of $7 in file1 is zero. So for example line 1 in file1 is intron so that is equvilant to zero or skipped (those lines are not needed in the calculation).



It is possible that a $1 value in file2 may not exist in file1 and in this case the value of $2 in file2 is zero. Line3 infile2 is an example and is set to zero because it does not exist in file1. Thank you:).



Awk w/ output



awk '
FNR==NR{ # process same line
b[$4]=$3-$2;
next # process next line
}
{
a[$5]+=($3-$2)
}
{
split($1, b, " "); print b[0], a[b[0]]
}' OFS="t" file1 file2


Output



-2135
-2135
-2222
-2351
-2351
-2414


File1 tab-delimited



chr5    86667863    86667879    RASA1   intron  16
chr5 86669977 86669995 RASA1 splicing 18
chr5 86670703 86670805 RASA1 exon 102
chr5 86679453 86679547 RASA1 intron 94
chr5 86679571 86679673 RASA1 exon 102
chr19 15088950 15088961 NOTCH2 intron 50
chr19 15288950 15288961 NOTCH3 intron 11
chr19 15308240 15308275 NOTCH3 exon 35


File2 space delimited



RASA1 2135
NOTCH2 0
GIMAP8 87
NOTCH3 129
FOXF2 0
PRB3 63


Desired out after file2 is updated



RASA1 222  `(102+102+18)`
NOTCH2 0
GIMAP8 0
NOTCH3 35 `(35)`
FOXF2 0
PRB3 0


Maybe adding a | after the first awk with:



awk 'FNR==NR { a[$1]=$7; next } { if(a[$5]){$1=a[$5] }; print }'


To update file2










share|improve this question




















  • 1




    Instead of making the index of the array FNR make it $4 and increment the value stored in the array at that index by the result of your $3-$2 calcuation. Like a[$5]+=($3-$2) Then when processing the second file you can just pull the item from the array at the index that matches the first string in $1 like {split($1, b, " "); print b[0], a[b[0]] (pardon if that syntax isn't 100%, but that should get in the ballpark.
    – JNevill
    Nov 23 '18 at 18:54












  • I made the edits and don't think I follow, but included the output in the original post. Thank you :).
    – cm0728
    Nov 23 '18 at 20:55
















0














The below awk will produce the tab-delimeted file1 with the difference between $3-$2 calulated for each line and printed in $6. Before the awk is executed only 5 fields exist.



What I am having trouble with updated each $2 value in file2 with the $7 value of file1 if the $1 value of file2 matches the $5 of file1 and $6 in file1 is not intron. If the value of $5 is intron then then the value of $7 in file1 is zero. So for example line 1 in file1 is intron so that is equvilant to zero or skipped (those lines are not needed in the calculation).



It is possible that a $1 value in file2 may not exist in file1 and in this case the value of $2 in file2 is zero. Line3 infile2 is an example and is set to zero because it does not exist in file1. Thank you:).



Awk w/ output



awk '
FNR==NR{ # process same line
b[$4]=$3-$2;
next # process next line
}
{
a[$5]+=($3-$2)
}
{
split($1, b, " "); print b[0], a[b[0]]
}' OFS="t" file1 file2


Output



-2135
-2135
-2222
-2351
-2351
-2414


File1 tab-delimited



chr5    86667863    86667879    RASA1   intron  16
chr5 86669977 86669995 RASA1 splicing 18
chr5 86670703 86670805 RASA1 exon 102
chr5 86679453 86679547 RASA1 intron 94
chr5 86679571 86679673 RASA1 exon 102
chr19 15088950 15088961 NOTCH2 intron 50
chr19 15288950 15288961 NOTCH3 intron 11
chr19 15308240 15308275 NOTCH3 exon 35


File2 space delimited



RASA1 2135
NOTCH2 0
GIMAP8 87
NOTCH3 129
FOXF2 0
PRB3 63


Desired out after file2 is updated



RASA1 222  `(102+102+18)`
NOTCH2 0
GIMAP8 0
NOTCH3 35 `(35)`
FOXF2 0
PRB3 0


Maybe adding a | after the first awk with:



awk 'FNR==NR { a[$1]=$7; next } { if(a[$5]){$1=a[$5] }; print }'


To update file2










share|improve this question




















  • 1




    Instead of making the index of the array FNR make it $4 and increment the value stored in the array at that index by the result of your $3-$2 calcuation. Like a[$5]+=($3-$2) Then when processing the second file you can just pull the item from the array at the index that matches the first string in $1 like {split($1, b, " "); print b[0], a[b[0]] (pardon if that syntax isn't 100%, but that should get in the ballpark.
    – JNevill
    Nov 23 '18 at 18:54












  • I made the edits and don't think I follow, but included the output in the original post. Thank you :).
    – cm0728
    Nov 23 '18 at 20:55














0












0








0







The below awk will produce the tab-delimeted file1 with the difference between $3-$2 calulated for each line and printed in $6. Before the awk is executed only 5 fields exist.



What I am having trouble with updated each $2 value in file2 with the $7 value of file1 if the $1 value of file2 matches the $5 of file1 and $6 in file1 is not intron. If the value of $5 is intron then then the value of $7 in file1 is zero. So for example line 1 in file1 is intron so that is equvilant to zero or skipped (those lines are not needed in the calculation).



It is possible that a $1 value in file2 may not exist in file1 and in this case the value of $2 in file2 is zero. Line3 infile2 is an example and is set to zero because it does not exist in file1. Thank you:).



Awk w/ output



awk '
FNR==NR{ # process same line
b[$4]=$3-$2;
next # process next line
}
{
a[$5]+=($3-$2)
}
{
split($1, b, " "); print b[0], a[b[0]]
}' OFS="t" file1 file2


Output



-2135
-2135
-2222
-2351
-2351
-2414


File1 tab-delimited



chr5    86667863    86667879    RASA1   intron  16
chr5 86669977 86669995 RASA1 splicing 18
chr5 86670703 86670805 RASA1 exon 102
chr5 86679453 86679547 RASA1 intron 94
chr5 86679571 86679673 RASA1 exon 102
chr19 15088950 15088961 NOTCH2 intron 50
chr19 15288950 15288961 NOTCH3 intron 11
chr19 15308240 15308275 NOTCH3 exon 35


File2 space delimited



RASA1 2135
NOTCH2 0
GIMAP8 87
NOTCH3 129
FOXF2 0
PRB3 63


Desired out after file2 is updated



RASA1 222  `(102+102+18)`
NOTCH2 0
GIMAP8 0
NOTCH3 35 `(35)`
FOXF2 0
PRB3 0


Maybe adding a | after the first awk with:



awk 'FNR==NR { a[$1]=$7; next } { if(a[$5]){$1=a[$5] }; print }'


To update file2










share|improve this question















The below awk will produce the tab-delimeted file1 with the difference between $3-$2 calulated for each line and printed in $6. Before the awk is executed only 5 fields exist.



What I am having trouble with updated each $2 value in file2 with the $7 value of file1 if the $1 value of file2 matches the $5 of file1 and $6 in file1 is not intron. If the value of $5 is intron then then the value of $7 in file1 is zero. So for example line 1 in file1 is intron so that is equvilant to zero or skipped (those lines are not needed in the calculation).



It is possible that a $1 value in file2 may not exist in file1 and in this case the value of $2 in file2 is zero. Line3 infile2 is an example and is set to zero because it does not exist in file1. Thank you:).



Awk w/ output



awk '
FNR==NR{ # process same line
b[$4]=$3-$2;
next # process next line
}
{
a[$5]+=($3-$2)
}
{
split($1, b, " "); print b[0], a[b[0]]
}' OFS="t" file1 file2


Output



-2135
-2135
-2222
-2351
-2351
-2414


File1 tab-delimited



chr5    86667863    86667879    RASA1   intron  16
chr5 86669977 86669995 RASA1 splicing 18
chr5 86670703 86670805 RASA1 exon 102
chr5 86679453 86679547 RASA1 intron 94
chr5 86679571 86679673 RASA1 exon 102
chr19 15088950 15088961 NOTCH2 intron 50
chr19 15288950 15288961 NOTCH3 intron 11
chr19 15308240 15308275 NOTCH3 exon 35


File2 space delimited



RASA1 2135
NOTCH2 0
GIMAP8 87
NOTCH3 129
FOXF2 0
PRB3 63


Desired out after file2 is updated



RASA1 222  `(102+102+18)`
NOTCH2 0
GIMAP8 0
NOTCH3 35 `(35)`
FOXF2 0
PRB3 0


Maybe adding a | after the first awk with:



awk 'FNR==NR { a[$1]=$7; next } { if(a[$5]){$1=a[$5] }; print }'


To update file2







awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 20:58







cm0728

















asked Nov 23 '18 at 18:41









cm0728cm0728

1,486819




1,486819








  • 1




    Instead of making the index of the array FNR make it $4 and increment the value stored in the array at that index by the result of your $3-$2 calcuation. Like a[$5]+=($3-$2) Then when processing the second file you can just pull the item from the array at the index that matches the first string in $1 like {split($1, b, " "); print b[0], a[b[0]] (pardon if that syntax isn't 100%, but that should get in the ballpark.
    – JNevill
    Nov 23 '18 at 18:54












  • I made the edits and don't think I follow, but included the output in the original post. Thank you :).
    – cm0728
    Nov 23 '18 at 20:55














  • 1




    Instead of making the index of the array FNR make it $4 and increment the value stored in the array at that index by the result of your $3-$2 calcuation. Like a[$5]+=($3-$2) Then when processing the second file you can just pull the item from the array at the index that matches the first string in $1 like {split($1, b, " "); print b[0], a[b[0]] (pardon if that syntax isn't 100%, but that should get in the ballpark.
    – JNevill
    Nov 23 '18 at 18:54












  • I made the edits and don't think I follow, but included the output in the original post. Thank you :).
    – cm0728
    Nov 23 '18 at 20:55








1




1




Instead of making the index of the array FNR make it $4 and increment the value stored in the array at that index by the result of your $3-$2 calcuation. Like a[$5]+=($3-$2) Then when processing the second file you can just pull the item from the array at the index that matches the first string in $1 like {split($1, b, " "); print b[0], a[b[0]] (pardon if that syntax isn't 100%, but that should get in the ballpark.
– JNevill
Nov 23 '18 at 18:54






Instead of making the index of the array FNR make it $4 and increment the value stored in the array at that index by the result of your $3-$2 calcuation. Like a[$5]+=($3-$2) Then when processing the second file you can just pull the item from the array at the index that matches the first string in $1 like {split($1, b, " "); print b[0], a[b[0]] (pardon if that syntax isn't 100%, but that should get in the ballpark.
– JNevill
Nov 23 '18 at 18:54














I made the edits and don't think I follow, but included the output in the original post. Thank you :).
– cm0728
Nov 23 '18 at 20:55




I made the edits and don't think I follow, but included the output in the original post. Thank you :).
– cm0728
Nov 23 '18 at 20:55












2 Answers
2






active

oldest

votes


















2














Could you please try following. It will provide you sequence of output in same order of Input_file's order.



awk '
FNR==NR{
if(!b[$1]++){
c[++count]=$1
}
a[$1]
next
}
($4 in a) && $5!="intron"{
a[$4]+=$NF
}
END{
for(i=1;i<=count;i++){
print c[i],a[c[i]]?a[c[i]]:0
}
}' Input_file2 Input_file1


Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1 -----> Input_file2 FS="t" Input_file1. To get output as TAB delimited either append above code's output to | column -t command or set OFS="t" near to FS="t" too.



Output will be as follows.



RASA1 222
NOTCH2 0
GIMAP8 0
NOTCH3 35
FOXF2 0
PRB3 0





share|improve this answer





























    1














    if I understood it correctly, this should do what you expect



    $ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
    {$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated





    share|improve this answer























    • IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
      – RavinderSingh13
      Nov 23 '18 at 23:48






    • 1




      Yes, thanks. Fixed to $5
      – karakfa
      Nov 23 '18 at 23:59










    • Thank you both very much :).
      – cm0728
      Nov 26 '18 at 13:24











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451595%2fawk-to-update-file-based-on-match-and-condition-in-another%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Could you please try following. It will provide you sequence of output in same order of Input_file's order.



    awk '
    FNR==NR{
    if(!b[$1]++){
    c[++count]=$1
    }
    a[$1]
    next
    }
    ($4 in a) && $5!="intron"{
    a[$4]+=$NF
    }
    END{
    for(i=1;i<=count;i++){
    print c[i],a[c[i]]?a[c[i]]:0
    }
    }' Input_file2 Input_file1


    Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1 -----> Input_file2 FS="t" Input_file1. To get output as TAB delimited either append above code's output to | column -t command or set OFS="t" near to FS="t" too.



    Output will be as follows.



    RASA1 222
    NOTCH2 0
    GIMAP8 0
    NOTCH3 35
    FOXF2 0
    PRB3 0





    share|improve this answer


























      2














      Could you please try following. It will provide you sequence of output in same order of Input_file's order.



      awk '
      FNR==NR{
      if(!b[$1]++){
      c[++count]=$1
      }
      a[$1]
      next
      }
      ($4 in a) && $5!="intron"{
      a[$4]+=$NF
      }
      END{
      for(i=1;i<=count;i++){
      print c[i],a[c[i]]?a[c[i]]:0
      }
      }' Input_file2 Input_file1


      Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1 -----> Input_file2 FS="t" Input_file1. To get output as TAB delimited either append above code's output to | column -t command or set OFS="t" near to FS="t" too.



      Output will be as follows.



      RASA1 222
      NOTCH2 0
      GIMAP8 0
      NOTCH3 35
      FOXF2 0
      PRB3 0





      share|improve this answer
























        2












        2








        2






        Could you please try following. It will provide you sequence of output in same order of Input_file's order.



        awk '
        FNR==NR{
        if(!b[$1]++){
        c[++count]=$1
        }
        a[$1]
        next
        }
        ($4 in a) && $5!="intron"{
        a[$4]+=$NF
        }
        END{
        for(i=1;i<=count;i++){
        print c[i],a[c[i]]?a[c[i]]:0
        }
        }' Input_file2 Input_file1


        Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1 -----> Input_file2 FS="t" Input_file1. To get output as TAB delimited either append above code's output to | column -t command or set OFS="t" near to FS="t" too.



        Output will be as follows.



        RASA1 222
        NOTCH2 0
        GIMAP8 0
        NOTCH3 35
        FOXF2 0
        PRB3 0





        share|improve this answer












        Could you please try following. It will provide you sequence of output in same order of Input_file's order.



        awk '
        FNR==NR{
        if(!b[$1]++){
        c[++count]=$1
        }
        a[$1]
        next
        }
        ($4 in a) && $5!="intron"{
        a[$4]+=$NF
        }
        END{
        for(i=1;i<=count;i++){
        print c[i],a[c[i]]?a[c[i]]:0
        }
        }' Input_file2 Input_file1


        Since your Input_file1 is NOT TAB delimited as per your claim, so in case it is then edit Input_file2 Input_file1 -----> Input_file2 FS="t" Input_file1. To get output as TAB delimited either append above code's output to | column -t command or set OFS="t" near to FS="t" too.



        Output will be as follows.



        RASA1 222
        NOTCH2 0
        GIMAP8 0
        NOTCH3 35
        FOXF2 0
        PRB3 0






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 '18 at 23:44









        RavinderSingh13RavinderSingh13

        25.7k41438




        25.7k41438

























            1














            if I understood it correctly, this should do what you expect



            $ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
            {$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated





            share|improve this answer























            • IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
              – RavinderSingh13
              Nov 23 '18 at 23:48






            • 1




              Yes, thanks. Fixed to $5
              – karakfa
              Nov 23 '18 at 23:59










            • Thank you both very much :).
              – cm0728
              Nov 26 '18 at 13:24
















            1














            if I understood it correctly, this should do what you expect



            $ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
            {$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated





            share|improve this answer























            • IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
              – RavinderSingh13
              Nov 23 '18 at 23:48






            • 1




              Yes, thanks. Fixed to $5
              – karakfa
              Nov 23 '18 at 23:59










            • Thank you both very much :).
              – cm0728
              Nov 26 '18 at 13:24














            1












            1








            1






            if I understood it correctly, this should do what you expect



            $ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
            {$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated





            share|improve this answer














            if I understood it correctly, this should do what you expect



            $ awk 'FNR==NR && $5!="intron" {a[$4]+=$3-$2; next}
            {$2=($1 in a)?a[$1]:0}1' file1 file2 > file2.updated






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 23 '18 at 23:58

























            answered Nov 23 '18 at 23:24









            karakfakarakfa

            48.2k52738




            48.2k52738












            • IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
              – RavinderSingh13
              Nov 23 '18 at 23:48






            • 1




              Yes, thanks. Fixed to $5
              – karakfa
              Nov 23 '18 at 23:59










            • Thank you both very much :).
              – cm0728
              Nov 26 '18 at 13:24


















            • IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
              – RavinderSingh13
              Nov 23 '18 at 23:48






            • 1




              Yes, thanks. Fixed to $5
              – karakfa
              Nov 23 '18 at 23:59










            • Thank you both very much :).
              – cm0728
              Nov 26 '18 at 13:24
















            IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
            – RavinderSingh13
            Nov 23 '18 at 23:48




            IMHO karafka sir, IMHO, I think OP mentioned it wrongly as per shown samples field from Input_file1 is 4th please.
            – RavinderSingh13
            Nov 23 '18 at 23:48




            1




            1




            Yes, thanks. Fixed to $5
            – karakfa
            Nov 23 '18 at 23:59




            Yes, thanks. Fixed to $5
            – karakfa
            Nov 23 '18 at 23:59












            Thank you both very much :).
            – cm0728
            Nov 26 '18 at 13:24




            Thank you both very much :).
            – cm0728
            Nov 26 '18 at 13:24


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451595%2fawk-to-update-file-based-on-match-and-condition-in-another%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)