How can I fill up and fill up the missing values of each group in Dataframe using Python?












0














This is an example of the dataframe:



For example,



df = 

Name Type Price

0 gg apartment 8
1 hh apartment 4
2 tty apartment 0
3 ttyt None 6
4 re house 6
5 ew house 2
6 rr house 0
7 tr None 5
8 mm None 0


I worked on converting the "unknown" to "NoInfo" in "Type":



import pandas as pd
import numpy as np
from scipy.stats import zscore



df = pd.read_csv("C:/Users/User/Desktop/properties.csv")



df.Type.fillna(value=pd.np.nan, inplace=True)



df['Type'].fillna(value='NoInfo', inplace = True)



The dataframe is like below:



For example,



df = 
Name Type price

0 gg apartment 8
1 hh apartment 4
2 tty apartment 0
3 ttyt NoInfo 6
4 re house 6
5 ew house 2
6 rr house 0
7 tr NoInfo 5
8 mm NoInfo 0


After that, I replaced the "0" values to the average value of the prices of each group "Apartment", "House" and "NoInfo" and take the z-score of each group.



df['price'] = df['price'].replace(0, np.nan)



df['price'] = pd.to_numeric(df.price, errors='coerce')



df['price'] = df.groupby('Type')['price'].transform(lambda x : x.mean())



df['price_zscore'] = df[['price']].apply(zscore)



After running this code, all values of the prices of all property groups have been changed and all z-score values in independent variable 'price_zscore' are "NaN".



I am looking to get the average value of the price for each property group "Apartments and houses" in "Type" with replacing '0' in independent variable 'price' with the average of each property group (apartments, houses).



For example, the "0" values in independent variable "price" in the property group "Apartment" in independent variable "Type" has to be replaced with the average of prices the property group "Apartment", the "0" values in "price" in property group "house" has to be replaced with the average of prices the property group "house" and the "0" values in "price" in property group "NoInfo" has to be replaced with the average of prices the property group "NoInfo"



df =
Name Type Price



0    gg         apartment            8   
1 hh apartment 4
2 tty apartment 6 # (8+4)/2 = 6
3 ttyt NoInfo 6
4 re house 6
5 ew house 2
6 rr house 4 # (6+2)/2 = 4
7 tr NoInfo 5
8 mm NoInfo 0


After that, I am looking to get the "Z-score" of each property group. For example, I am looking to get the z-score of the property group "Apartment", the Zscore of the property group "House" and the zscore of the "NoInfo" group and put all z-scores of all groups in independent varieble 'price_zscore'.



I need really your help to fix the code above.










share|improve this question





























    0














    This is an example of the dataframe:



    For example,



    df = 

    Name Type Price

    0 gg apartment 8
    1 hh apartment 4
    2 tty apartment 0
    3 ttyt None 6
    4 re house 6
    5 ew house 2
    6 rr house 0
    7 tr None 5
    8 mm None 0


    I worked on converting the "unknown" to "NoInfo" in "Type":



    import pandas as pd
    import numpy as np
    from scipy.stats import zscore



    df = pd.read_csv("C:/Users/User/Desktop/properties.csv")



    df.Type.fillna(value=pd.np.nan, inplace=True)



    df['Type'].fillna(value='NoInfo', inplace = True)



    The dataframe is like below:



    For example,



    df = 
    Name Type price

    0 gg apartment 8
    1 hh apartment 4
    2 tty apartment 0
    3 ttyt NoInfo 6
    4 re house 6
    5 ew house 2
    6 rr house 0
    7 tr NoInfo 5
    8 mm NoInfo 0


    After that, I replaced the "0" values to the average value of the prices of each group "Apartment", "House" and "NoInfo" and take the z-score of each group.



    df['price'] = df['price'].replace(0, np.nan)



    df['price'] = pd.to_numeric(df.price, errors='coerce')



    df['price'] = df.groupby('Type')['price'].transform(lambda x : x.mean())



    df['price_zscore'] = df[['price']].apply(zscore)



    After running this code, all values of the prices of all property groups have been changed and all z-score values in independent variable 'price_zscore' are "NaN".



    I am looking to get the average value of the price for each property group "Apartments and houses" in "Type" with replacing '0' in independent variable 'price' with the average of each property group (apartments, houses).



    For example, the "0" values in independent variable "price" in the property group "Apartment" in independent variable "Type" has to be replaced with the average of prices the property group "Apartment", the "0" values in "price" in property group "house" has to be replaced with the average of prices the property group "house" and the "0" values in "price" in property group "NoInfo" has to be replaced with the average of prices the property group "NoInfo"



    df =
    Name Type Price



    0    gg         apartment            8   
    1 hh apartment 4
    2 tty apartment 6 # (8+4)/2 = 6
    3 ttyt NoInfo 6
    4 re house 6
    5 ew house 2
    6 rr house 4 # (6+2)/2 = 4
    7 tr NoInfo 5
    8 mm NoInfo 0


    After that, I am looking to get the "Z-score" of each property group. For example, I am looking to get the z-score of the property group "Apartment", the Zscore of the property group "House" and the zscore of the "NoInfo" group and put all z-scores of all groups in independent varieble 'price_zscore'.



    I need really your help to fix the code above.










    share|improve this question



























      0












      0








      0


      1





      This is an example of the dataframe:



      For example,



      df = 

      Name Type Price

      0 gg apartment 8
      1 hh apartment 4
      2 tty apartment 0
      3 ttyt None 6
      4 re house 6
      5 ew house 2
      6 rr house 0
      7 tr None 5
      8 mm None 0


      I worked on converting the "unknown" to "NoInfo" in "Type":



      import pandas as pd
      import numpy as np
      from scipy.stats import zscore



      df = pd.read_csv("C:/Users/User/Desktop/properties.csv")



      df.Type.fillna(value=pd.np.nan, inplace=True)



      df['Type'].fillna(value='NoInfo', inplace = True)



      The dataframe is like below:



      For example,



      df = 
      Name Type price

      0 gg apartment 8
      1 hh apartment 4
      2 tty apartment 0
      3 ttyt NoInfo 6
      4 re house 6
      5 ew house 2
      6 rr house 0
      7 tr NoInfo 5
      8 mm NoInfo 0


      After that, I replaced the "0" values to the average value of the prices of each group "Apartment", "House" and "NoInfo" and take the z-score of each group.



      df['price'] = df['price'].replace(0, np.nan)



      df['price'] = pd.to_numeric(df.price, errors='coerce')



      df['price'] = df.groupby('Type')['price'].transform(lambda x : x.mean())



      df['price_zscore'] = df[['price']].apply(zscore)



      After running this code, all values of the prices of all property groups have been changed and all z-score values in independent variable 'price_zscore' are "NaN".



      I am looking to get the average value of the price for each property group "Apartments and houses" in "Type" with replacing '0' in independent variable 'price' with the average of each property group (apartments, houses).



      For example, the "0" values in independent variable "price" in the property group "Apartment" in independent variable "Type" has to be replaced with the average of prices the property group "Apartment", the "0" values in "price" in property group "house" has to be replaced with the average of prices the property group "house" and the "0" values in "price" in property group "NoInfo" has to be replaced with the average of prices the property group "NoInfo"



      df =
      Name Type Price



      0    gg         apartment            8   
      1 hh apartment 4
      2 tty apartment 6 # (8+4)/2 = 6
      3 ttyt NoInfo 6
      4 re house 6
      5 ew house 2
      6 rr house 4 # (6+2)/2 = 4
      7 tr NoInfo 5
      8 mm NoInfo 0


      After that, I am looking to get the "Z-score" of each property group. For example, I am looking to get the z-score of the property group "Apartment", the Zscore of the property group "House" and the zscore of the "NoInfo" group and put all z-scores of all groups in independent varieble 'price_zscore'.



      I need really your help to fix the code above.










      share|improve this question















      This is an example of the dataframe:



      For example,



      df = 

      Name Type Price

      0 gg apartment 8
      1 hh apartment 4
      2 tty apartment 0
      3 ttyt None 6
      4 re house 6
      5 ew house 2
      6 rr house 0
      7 tr None 5
      8 mm None 0


      I worked on converting the "unknown" to "NoInfo" in "Type":



      import pandas as pd
      import numpy as np
      from scipy.stats import zscore



      df = pd.read_csv("C:/Users/User/Desktop/properties.csv")



      df.Type.fillna(value=pd.np.nan, inplace=True)



      df['Type'].fillna(value='NoInfo', inplace = True)



      The dataframe is like below:



      For example,



      df = 
      Name Type price

      0 gg apartment 8
      1 hh apartment 4
      2 tty apartment 0
      3 ttyt NoInfo 6
      4 re house 6
      5 ew house 2
      6 rr house 0
      7 tr NoInfo 5
      8 mm NoInfo 0


      After that, I replaced the "0" values to the average value of the prices of each group "Apartment", "House" and "NoInfo" and take the z-score of each group.



      df['price'] = df['price'].replace(0, np.nan)



      df['price'] = pd.to_numeric(df.price, errors='coerce')



      df['price'] = df.groupby('Type')['price'].transform(lambda x : x.mean())



      df['price_zscore'] = df[['price']].apply(zscore)



      After running this code, all values of the prices of all property groups have been changed and all z-score values in independent variable 'price_zscore' are "NaN".



      I am looking to get the average value of the price for each property group "Apartments and houses" in "Type" with replacing '0' in independent variable 'price' with the average of each property group (apartments, houses).



      For example, the "0" values in independent variable "price" in the property group "Apartment" in independent variable "Type" has to be replaced with the average of prices the property group "Apartment", the "0" values in "price" in property group "house" has to be replaced with the average of prices the property group "house" and the "0" values in "price" in property group "NoInfo" has to be replaced with the average of prices the property group "NoInfo"



      df =
      Name Type Price



      0    gg         apartment            8   
      1 hh apartment 4
      2 tty apartment 6 # (8+4)/2 = 6
      3 ttyt NoInfo 6
      4 re house 6
      5 ew house 2
      6 rr house 4 # (6+2)/2 = 4
      7 tr NoInfo 5
      8 mm NoInfo 0


      After that, I am looking to get the "Z-score" of each property group. For example, I am looking to get the z-score of the property group "Apartment", the Zscore of the property group "House" and the zscore of the "NoInfo" group and put all z-scores of all groups in independent varieble 'price_zscore'.



      I need really your help to fix the code above.







      python group-by






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 at 17:46

























      asked Nov 20 at 13:58









      Elwakdy

      106




      106
























          1 Answer
          1






          active

          oldest

          votes


















          1














          In pandas you can replace missing values with NaN using replace(). Then you can impute them with the group mean. Eventually, you can compute the z-score of the price using the function zscore from the stats module of scipy.



          Here is the code:



          import numpy as np
          import pandas as pd
          from scipy.stats import zscore


          df = pd.read_csv('./data.csv')

          df['price'] = df['price'].replace(0, np.nan)
          df['price'] = df.groupby('type').transform(lambda x: x.fillna(x.mean()))

          df['price_zscore'] = df[['price']].apply(zscore) # You need to apply score function on a DataFrame—not a Series.





          share|improve this answer























          • Thank you so much for your help. I did what you advised me to do as following
            – Elwakdy
            Nov 20 at 19:45










          • new_df = df.replace(['Unknown', 'na'], np.nan) df["price"] = df.groupby("Type").transform(lambda x: x.fillna(x.mean())) but I got this error: Transform function invalid for data types
            – Elwakdy
            Nov 20 at 19:47










          • @Elwakdy I updated the answer accordingly
            – leoburgy
            Nov 21 at 7:42










          • Thank you so much for your help. I understand from your code that you filled up all fields "0" in independent features area with the average value of all prices of all properties "Apartments and houses".
            – Elwakdy
            Nov 21 at 14:38










          • My question was how can I get the average value of each property "Apartments and houses" in "Type" and replace '0' in independent variable 'price' with the average of each property group.
            – Elwakdy
            Nov 21 at 14:38











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394638%2fhow-can-i-fill-up-and-fill-up-the-missing-values-of-each-group-in-dataframe-usin%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          In pandas you can replace missing values with NaN using replace(). Then you can impute them with the group mean. Eventually, you can compute the z-score of the price using the function zscore from the stats module of scipy.



          Here is the code:



          import numpy as np
          import pandas as pd
          from scipy.stats import zscore


          df = pd.read_csv('./data.csv')

          df['price'] = df['price'].replace(0, np.nan)
          df['price'] = df.groupby('type').transform(lambda x: x.fillna(x.mean()))

          df['price_zscore'] = df[['price']].apply(zscore) # You need to apply score function on a DataFrame—not a Series.





          share|improve this answer























          • Thank you so much for your help. I did what you advised me to do as following
            – Elwakdy
            Nov 20 at 19:45










          • new_df = df.replace(['Unknown', 'na'], np.nan) df["price"] = df.groupby("Type").transform(lambda x: x.fillna(x.mean())) but I got this error: Transform function invalid for data types
            – Elwakdy
            Nov 20 at 19:47










          • @Elwakdy I updated the answer accordingly
            – leoburgy
            Nov 21 at 7:42










          • Thank you so much for your help. I understand from your code that you filled up all fields "0" in independent features area with the average value of all prices of all properties "Apartments and houses".
            – Elwakdy
            Nov 21 at 14:38










          • My question was how can I get the average value of each property "Apartments and houses" in "Type" and replace '0' in independent variable 'price' with the average of each property group.
            – Elwakdy
            Nov 21 at 14:38
















          1














          In pandas you can replace missing values with NaN using replace(). Then you can impute them with the group mean. Eventually, you can compute the z-score of the price using the function zscore from the stats module of scipy.



          Here is the code:



          import numpy as np
          import pandas as pd
          from scipy.stats import zscore


          df = pd.read_csv('./data.csv')

          df['price'] = df['price'].replace(0, np.nan)
          df['price'] = df.groupby('type').transform(lambda x: x.fillna(x.mean()))

          df['price_zscore'] = df[['price']].apply(zscore) # You need to apply score function on a DataFrame—not a Series.





          share|improve this answer























          • Thank you so much for your help. I did what you advised me to do as following
            – Elwakdy
            Nov 20 at 19:45










          • new_df = df.replace(['Unknown', 'na'], np.nan) df["price"] = df.groupby("Type").transform(lambda x: x.fillna(x.mean())) but I got this error: Transform function invalid for data types
            – Elwakdy
            Nov 20 at 19:47










          • @Elwakdy I updated the answer accordingly
            – leoburgy
            Nov 21 at 7:42










          • Thank you so much for your help. I understand from your code that you filled up all fields "0" in independent features area with the average value of all prices of all properties "Apartments and houses".
            – Elwakdy
            Nov 21 at 14:38










          • My question was how can I get the average value of each property "Apartments and houses" in "Type" and replace '0' in independent variable 'price' with the average of each property group.
            – Elwakdy
            Nov 21 at 14:38














          1












          1








          1






          In pandas you can replace missing values with NaN using replace(). Then you can impute them with the group mean. Eventually, you can compute the z-score of the price using the function zscore from the stats module of scipy.



          Here is the code:



          import numpy as np
          import pandas as pd
          from scipy.stats import zscore


          df = pd.read_csv('./data.csv')

          df['price'] = df['price'].replace(0, np.nan)
          df['price'] = df.groupby('type').transform(lambda x: x.fillna(x.mean()))

          df['price_zscore'] = df[['price']].apply(zscore) # You need to apply score function on a DataFrame—not a Series.





          share|improve this answer














          In pandas you can replace missing values with NaN using replace(). Then you can impute them with the group mean. Eventually, you can compute the z-score of the price using the function zscore from the stats module of scipy.



          Here is the code:



          import numpy as np
          import pandas as pd
          from scipy.stats import zscore


          df = pd.read_csv('./data.csv')

          df['price'] = df['price'].replace(0, np.nan)
          df['price'] = df.groupby('type').transform(lambda x: x.fillna(x.mean()))

          df['price_zscore'] = df[['price']].apply(zscore) # You need to apply score function on a DataFrame—not a Series.






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 21 at 21:25

























          answered Nov 20 at 19:32









          leoburgy

          1086




          1086












          • Thank you so much for your help. I did what you advised me to do as following
            – Elwakdy
            Nov 20 at 19:45










          • new_df = df.replace(['Unknown', 'na'], np.nan) df["price"] = df.groupby("Type").transform(lambda x: x.fillna(x.mean())) but I got this error: Transform function invalid for data types
            – Elwakdy
            Nov 20 at 19:47










          • @Elwakdy I updated the answer accordingly
            – leoburgy
            Nov 21 at 7:42










          • Thank you so much for your help. I understand from your code that you filled up all fields "0" in independent features area with the average value of all prices of all properties "Apartments and houses".
            – Elwakdy
            Nov 21 at 14:38










          • My question was how can I get the average value of each property "Apartments and houses" in "Type" and replace '0' in independent variable 'price' with the average of each property group.
            – Elwakdy
            Nov 21 at 14:38


















          • Thank you so much for your help. I did what you advised me to do as following
            – Elwakdy
            Nov 20 at 19:45










          • new_df = df.replace(['Unknown', 'na'], np.nan) df["price"] = df.groupby("Type").transform(lambda x: x.fillna(x.mean())) but I got this error: Transform function invalid for data types
            – Elwakdy
            Nov 20 at 19:47










          • @Elwakdy I updated the answer accordingly
            – leoburgy
            Nov 21 at 7:42










          • Thank you so much for your help. I understand from your code that you filled up all fields "0" in independent features area with the average value of all prices of all properties "Apartments and houses".
            – Elwakdy
            Nov 21 at 14:38










          • My question was how can I get the average value of each property "Apartments and houses" in "Type" and replace '0' in independent variable 'price' with the average of each property group.
            – Elwakdy
            Nov 21 at 14:38
















          Thank you so much for your help. I did what you advised me to do as following
          – Elwakdy
          Nov 20 at 19:45




          Thank you so much for your help. I did what you advised me to do as following
          – Elwakdy
          Nov 20 at 19:45












          new_df = df.replace(['Unknown', 'na'], np.nan) df["price"] = df.groupby("Type").transform(lambda x: x.fillna(x.mean())) but I got this error: Transform function invalid for data types
          – Elwakdy
          Nov 20 at 19:47




          new_df = df.replace(['Unknown', 'na'], np.nan) df["price"] = df.groupby("Type").transform(lambda x: x.fillna(x.mean())) but I got this error: Transform function invalid for data types
          – Elwakdy
          Nov 20 at 19:47












          @Elwakdy I updated the answer accordingly
          – leoburgy
          Nov 21 at 7:42




          @Elwakdy I updated the answer accordingly
          – leoburgy
          Nov 21 at 7:42












          Thank you so much for your help. I understand from your code that you filled up all fields "0" in independent features area with the average value of all prices of all properties "Apartments and houses".
          – Elwakdy
          Nov 21 at 14:38




          Thank you so much for your help. I understand from your code that you filled up all fields "0" in independent features area with the average value of all prices of all properties "Apartments and houses".
          – Elwakdy
          Nov 21 at 14:38












          My question was how can I get the average value of each property "Apartments and houses" in "Type" and replace '0' in independent variable 'price' with the average of each property group.
          – Elwakdy
          Nov 21 at 14:38




          My question was how can I get the average value of each property "Apartments and houses" in "Type" and replace '0' in independent variable 'price' with the average of each property group.
          – Elwakdy
          Nov 21 at 14:38


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53394638%2fhow-can-i-fill-up-and-fill-up-the-missing-values-of-each-group-in-dataframe-usin%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

          Calculate evaluation metrics using cross_val_predict sklearn

          Insert data from modal to MySQL (multiple modal on website)