populating pandas columns based on values in other columns












2














My dataframe contains these columns



ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City
1 6th street MN Mpls
2 15th St MI Flint
3 MA Boston Essex St NY New York
4 7 street SE MN Mpls 8th St IL Chicago


Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2



In the above case final dataframe would be like this



 ID     Address   State    City
1 6th street MN Mpls
2 15th St MI Flint
3 Essex St NY New York
4 7 street SE MN Mpls


Currently,I am doing this



def fill_add(address1,address2):
if address1!='':
address=address1

elif address1=='' and address2!='':
address=address2
elif address1=='' and address2=='':
address=''

return address


def fill_add_apply(df):
df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)


Do I have to do the same for all the other columns?Is there a better way?



Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.










share|improve this question





























    2














    My dataframe contains these columns



    ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City
    1 6th street MN Mpls
    2 15th St MI Flint
    3 MA Boston Essex St NY New York
    4 7 street SE MN Mpls 8th St IL Chicago


    Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2



    In the above case final dataframe would be like this



     ID     Address   State    City
    1 6th street MN Mpls
    2 15th St MI Flint
    3 Essex St NY New York
    4 7 street SE MN Mpls


    Currently,I am doing this



    def fill_add(address1,address2):
    if address1!='':
    address=address1

    elif address1=='' and address2!='':
    address=address2
    elif address1=='' and address2=='':
    address=''

    return address


    def fill_add_apply(df):
    df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)


    Do I have to do the same for all the other columns?Is there a better way?



    Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
    In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.










    share|improve this question



























      2












      2








      2







      My dataframe contains these columns



      ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City
      1 6th street MN Mpls
      2 15th St MI Flint
      3 MA Boston Essex St NY New York
      4 7 street SE MN Mpls 8th St IL Chicago


      Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2



      In the above case final dataframe would be like this



       ID     Address   State    City
      1 6th street MN Mpls
      2 15th St MI Flint
      3 Essex St NY New York
      4 7 street SE MN Mpls


      Currently,I am doing this



      def fill_add(address1,address2):
      if address1!='':
      address=address1

      elif address1=='' and address2!='':
      address=address2
      elif address1=='' and address2=='':
      address=''

      return address


      def fill_add_apply(df):
      df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)


      Do I have to do the same for all the other columns?Is there a better way?



      Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
      In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.










      share|improve this question















      My dataframe contains these columns



      ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City
      1 6th street MN Mpls
      2 15th St MI Flint
      3 MA Boston Essex St NY New York
      4 7 street SE MN Mpls 8th St IL Chicago


      Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2



      In the above case final dataframe would be like this



       ID     Address   State    City
      1 6th street MN Mpls
      2 15th St MI Flint
      3 Essex St NY New York
      4 7 street SE MN Mpls


      Currently,I am doing this



      def fill_add(address1,address2):
      if address1!='':
      address=address1

      elif address1=='' and address2!='':
      address=address2
      elif address1=='' and address2=='':
      address=''

      return address


      def fill_add_apply(df):
      df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)


      Do I have to do the same for all the other columns?Is there a better way?



      Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
      In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.







      python pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 23 at 4:15

























      asked Nov 23 at 3:34









      amanda smith

      244




      244
























          3 Answers
          3






          active

          oldest

          votes


















          1














          1st modify your column , then using groupby +first



          df=df.replace('',np.nan)#prepare for first 

          df.columns=df.columns.str.replace('d+','')
          df.columns=df.columns.str.split('-').str[-1]
          newdf=df.groupby(level=0,axis=1).first()
          newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
          newdf
          Out[40]:
          Address City ID State
          0 6th street Mpls 1 MN
          1 15th St Flint 2 MI
          2 Essexb St New York 3 NY
          3 7 street SE Mpls 4 MN





          share|improve this answer























          • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
            – amanda smith
            Nov 23 at 4:07










          • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
            – W-B
            Nov 23 at 4:10





















          0














          (Given you do not have any duplicate index)



          Select indices you want to fill with Adress1:



          Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


          then put Address1 data in your desired columns:



          df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


          Now select indices you want to fill with address2:



          Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


          then fill these also:



          df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


          Remove columns that you do not want:



          df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)





          share|improve this answer































            0














            import numpy as np

            df=df.replace('',np.nan)

            addr_1=['ID','Address1','Address1-State','Address1-City']
            addr_2=['ID','Address2','Address2-State','Address2-City']

            new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

            new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

            #print(new_df)
            ID Address State City
            0 1 6th street MN Mpls
            1 2 15th St MI Flint
            2 3 Essex St NY New York
            3 4 7 street SE MN Mpls





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440366%2fpopulating-pandas-columns-based-on-values-in-other-columns%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              1st modify your column , then using groupby +first



              df=df.replace('',np.nan)#prepare for first 

              df.columns=df.columns.str.replace('d+','')
              df.columns=df.columns.str.split('-').str[-1]
              newdf=df.groupby(level=0,axis=1).first()
              newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
              newdf
              Out[40]:
              Address City ID State
              0 6th street Mpls 1 MN
              1 15th St Flint 2 MI
              2 Essexb St New York 3 NY
              3 7 street SE Mpls 4 MN





              share|improve this answer























              • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
                – amanda smith
                Nov 23 at 4:07










              • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
                – W-B
                Nov 23 at 4:10


















              1














              1st modify your column , then using groupby +first



              df=df.replace('',np.nan)#prepare for first 

              df.columns=df.columns.str.replace('d+','')
              df.columns=df.columns.str.split('-').str[-1]
              newdf=df.groupby(level=0,axis=1).first()
              newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
              newdf
              Out[40]:
              Address City ID State
              0 6th street Mpls 1 MN
              1 15th St Flint 2 MI
              2 Essexb St New York 3 NY
              3 7 street SE Mpls 4 MN





              share|improve this answer























              • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
                – amanda smith
                Nov 23 at 4:07










              • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
                – W-B
                Nov 23 at 4:10
















              1












              1








              1






              1st modify your column , then using groupby +first



              df=df.replace('',np.nan)#prepare for first 

              df.columns=df.columns.str.replace('d+','')
              df.columns=df.columns.str.split('-').str[-1]
              newdf=df.groupby(level=0,axis=1).first()
              newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
              newdf
              Out[40]:
              Address City ID State
              0 6th street Mpls 1 MN
              1 15th St Flint 2 MI
              2 Essexb St New York 3 NY
              3 7 street SE Mpls 4 MN





              share|improve this answer














              1st modify your column , then using groupby +first



              df=df.replace('',np.nan)#prepare for first 

              df.columns=df.columns.str.replace('d+','')
              df.columns=df.columns.str.split('-').str[-1]
              newdf=df.groupby(level=0,axis=1).first()
              newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
              newdf
              Out[40]:
              Address City ID State
              0 6th street Mpls 1 MN
              1 15th St Flint 2 MI
              2 Essexb St New York 3 NY
              3 7 street SE Mpls 4 MN






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 23 at 4:10

























              answered Nov 23 at 4:03









              W-B

              100k73163




              100k73163












              • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
                – amanda smith
                Nov 23 at 4:07










              • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
                – W-B
                Nov 23 at 4:10




















              • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
                – amanda smith
                Nov 23 at 4:07










              • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
                – W-B
                Nov 23 at 4:10


















              Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
              – amanda smith
              Nov 23 at 4:07




              Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
              – amanda smith
              Nov 23 at 4:07












              @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
              – W-B
              Nov 23 at 4:10






              @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
              – W-B
              Nov 23 at 4:10















              0














              (Given you do not have any duplicate index)



              Select indices you want to fill with Adress1:



              Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


              then put Address1 data in your desired columns:



              df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


              Now select indices you want to fill with address2:



              Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


              then fill these also:



              df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


              Remove columns that you do not want:



              df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)





              share|improve this answer




























                0














                (Given you do not have any duplicate index)



                Select indices you want to fill with Adress1:



                Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


                then put Address1 data in your desired columns:



                df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


                Now select indices you want to fill with address2:



                Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


                then fill these also:



                df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


                Remove columns that you do not want:



                df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)





                share|improve this answer


























                  0












                  0








                  0






                  (Given you do not have any duplicate index)



                  Select indices you want to fill with Adress1:



                  Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


                  then put Address1 data in your desired columns:



                  df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


                  Now select indices you want to fill with address2:



                  Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


                  then fill these also:



                  df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


                  Remove columns that you do not want:



                  df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)





                  share|improve this answer














                  (Given you do not have any duplicate index)



                  Select indices you want to fill with Adress1:



                  Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


                  then put Address1 data in your desired columns:



                  df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


                  Now select indices you want to fill with address2:



                  Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


                  then fill these also:



                  df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


                  Remove columns that you do not want:



                  df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 23 at 4:24

























                  answered Nov 23 at 4:19









                  Joseph Choi

                  788




                  788























                      0














                      import numpy as np

                      df=df.replace('',np.nan)

                      addr_1=['ID','Address1','Address1-State','Address1-City']
                      addr_2=['ID','Address2','Address2-State','Address2-City']

                      new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

                      new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

                      #print(new_df)
                      ID Address State City
                      0 1 6th street MN Mpls
                      1 2 15th St MI Flint
                      2 3 Essex St NY New York
                      3 4 7 street SE MN Mpls





                      share|improve this answer




























                        0














                        import numpy as np

                        df=df.replace('',np.nan)

                        addr_1=['ID','Address1','Address1-State','Address1-City']
                        addr_2=['ID','Address2','Address2-State','Address2-City']

                        new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

                        new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

                        #print(new_df)
                        ID Address State City
                        0 1 6th street MN Mpls
                        1 2 15th St MI Flint
                        2 3 Essex St NY New York
                        3 4 7 street SE MN Mpls





                        share|improve this answer


























                          0












                          0








                          0






                          import numpy as np

                          df=df.replace('',np.nan)

                          addr_1=['ID','Address1','Address1-State','Address1-City']
                          addr_2=['ID','Address2','Address2-State','Address2-City']

                          new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

                          new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

                          #print(new_df)
                          ID Address State City
                          0 1 6th street MN Mpls
                          1 2 15th St MI Flint
                          2 3 Essex St NY New York
                          3 4 7 street SE MN Mpls





                          share|improve this answer














                          import numpy as np

                          df=df.replace('',np.nan)

                          addr_1=['ID','Address1','Address1-State','Address1-City']
                          addr_2=['ID','Address2','Address2-State','Address2-City']

                          new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

                          new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

                          #print(new_df)
                          ID Address State City
                          0 1 6th street MN Mpls
                          1 2 15th St MI Flint
                          2 3 Essex St NY New York
                          3 4 7 street SE MN Mpls






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 23 at 5:16

























                          answered Nov 23 at 5:10









                          pyd

                          1,8241924




                          1,8241924






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440366%2fpopulating-pandas-columns-based-on-values-in-other-columns%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Lallio

                              Futebolista

                              Jornalista