How to replace a float value with NaN in pandas?











up vote
1
down vote

favorite












I'm aware about the replace function in pandas: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html



But I've done this simple test and it is not working as expected when I try to replace a float value:



import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))

A B C D
0 1.437202 1.919894 -1.40674 -0.316737

df = df.replace(1.437202, np.nan)
print(df.head(n=1))

A B C D
0 1.437202 1.919894 -1.40674 -0.316737


As you see the [[0],[0]] has no change...any idea about what this could be due to?










share|improve this question


























    up vote
    1
    down vote

    favorite












    I'm aware about the replace function in pandas: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html



    But I've done this simple test and it is not working as expected when I try to replace a float value:



    import pandas as pd
    import numpy as np

    df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
    print(df.head(n=1))

    A B C D
    0 1.437202 1.919894 -1.40674 -0.316737

    df = df.replace(1.437202, np.nan)
    print(df.head(n=1))

    A B C D
    0 1.437202 1.919894 -1.40674 -0.316737


    As you see the [[0],[0]] has no change...any idea about what this could be due to?










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I'm aware about the replace function in pandas: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html



      But I've done this simple test and it is not working as expected when I try to replace a float value:



      import pandas as pd
      import numpy as np

      df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
      print(df.head(n=1))

      A B C D
      0 1.437202 1.919894 -1.40674 -0.316737

      df = df.replace(1.437202, np.nan)
      print(df.head(n=1))

      A B C D
      0 1.437202 1.919894 -1.40674 -0.316737


      As you see the [[0],[0]] has no change...any idea about what this could be due to?










      share|improve this question













      I'm aware about the replace function in pandas: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html



      But I've done this simple test and it is not working as expected when I try to replace a float value:



      import pandas as pd
      import numpy as np

      df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
      print(df.head(n=1))

      A B C D
      0 1.437202 1.919894 -1.40674 -0.316737

      df = df.replace(1.437202, np.nan)
      print(df.head(n=1))

      A B C D
      0 1.437202 1.919894 -1.40674 -0.316737


      As you see the [[0],[0]] has no change...any idea about what this could be due to?







      python pandas replace nan






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 at 8:35









      ralvarez

      961111




      961111
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          3
          down vote













          Problem is float precision, so use function numpy.isclose with mask:



          np.random.seed(123)
          df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
          print(df.head(n=1))
          A B C D
          0 -1.085631 0.997345 0.282978 -1.506295

          df = df.mask(np.isclose(df.values, 0.997345))


          Or use numpy.where:



          arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
          df = pd.DataFrame(arr, index=df.index, columns=df.columns)




          print(df.head(n=1))
          A B C D
          0 -1.085631 NaN 0.282978 -1.506295


          EDIT: You can also get only numeric columns by select_dtypes for filtering by subset with :



          np.random.seed(123)
          df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')

          cols = df.select_dtypes(np.number).columns
          df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
          print(df.head(n=1))
          A B C D E
          0 -1.085631 NaN 0.282978 -1.506295 a





          share|improve this answer























          • Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a string df.iloc[[0], [0]] = 'random_string' and then I try to apply both methods over the whole dataset, they return an error TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
            – ralvarez
            Nov 22 at 9:43












          • Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
            – ralvarez
            Nov 22 at 9:49










          • @ralvarez - Added general solution - filtering only numeric columns and apply solution
            – jezrael
            Nov 22 at 9:53


















          up vote
          0
          down vote













          Just a another trick for specific indices :



          >>> print(df.head(n=1))
          A B C D
          0 -0.042839 1.701118 0.064779 1.513046

          >>> df['A'][0] = np.nan

          >>> print(df.head(n=1))
          A B C D
          0 NaN 1.701118 0.064779 1.513046





          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426787%2fhow-to-replace-a-float-value-with-nan-in-pandas%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            3
            down vote













            Problem is float precision, so use function numpy.isclose with mask:



            np.random.seed(123)
            df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
            print(df.head(n=1))
            A B C D
            0 -1.085631 0.997345 0.282978 -1.506295

            df = df.mask(np.isclose(df.values, 0.997345))


            Or use numpy.where:



            arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
            df = pd.DataFrame(arr, index=df.index, columns=df.columns)




            print(df.head(n=1))
            A B C D
            0 -1.085631 NaN 0.282978 -1.506295


            EDIT: You can also get only numeric columns by select_dtypes for filtering by subset with :



            np.random.seed(123)
            df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')

            cols = df.select_dtypes(np.number).columns
            df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
            print(df.head(n=1))
            A B C D E
            0 -1.085631 NaN 0.282978 -1.506295 a





            share|improve this answer























            • Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a string df.iloc[[0], [0]] = 'random_string' and then I try to apply both methods over the whole dataset, they return an error TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
              – ralvarez
              Nov 22 at 9:43












            • Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
              – ralvarez
              Nov 22 at 9:49










            • @ralvarez - Added general solution - filtering only numeric columns and apply solution
              – jezrael
              Nov 22 at 9:53















            up vote
            3
            down vote













            Problem is float precision, so use function numpy.isclose with mask:



            np.random.seed(123)
            df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
            print(df.head(n=1))
            A B C D
            0 -1.085631 0.997345 0.282978 -1.506295

            df = df.mask(np.isclose(df.values, 0.997345))


            Or use numpy.where:



            arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
            df = pd.DataFrame(arr, index=df.index, columns=df.columns)




            print(df.head(n=1))
            A B C D
            0 -1.085631 NaN 0.282978 -1.506295


            EDIT: You can also get only numeric columns by select_dtypes for filtering by subset with :



            np.random.seed(123)
            df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')

            cols = df.select_dtypes(np.number).columns
            df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
            print(df.head(n=1))
            A B C D E
            0 -1.085631 NaN 0.282978 -1.506295 a





            share|improve this answer























            • Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a string df.iloc[[0], [0]] = 'random_string' and then I try to apply both methods over the whole dataset, they return an error TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
              – ralvarez
              Nov 22 at 9:43












            • Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
              – ralvarez
              Nov 22 at 9:49










            • @ralvarez - Added general solution - filtering only numeric columns and apply solution
              – jezrael
              Nov 22 at 9:53













            up vote
            3
            down vote










            up vote
            3
            down vote









            Problem is float precision, so use function numpy.isclose with mask:



            np.random.seed(123)
            df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
            print(df.head(n=1))
            A B C D
            0 -1.085631 0.997345 0.282978 -1.506295

            df = df.mask(np.isclose(df.values, 0.997345))


            Or use numpy.where:



            arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
            df = pd.DataFrame(arr, index=df.index, columns=df.columns)




            print(df.head(n=1))
            A B C D
            0 -1.085631 NaN 0.282978 -1.506295


            EDIT: You can also get only numeric columns by select_dtypes for filtering by subset with :



            np.random.seed(123)
            df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')

            cols = df.select_dtypes(np.number).columns
            df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
            print(df.head(n=1))
            A B C D E
            0 -1.085631 NaN 0.282978 -1.506295 a





            share|improve this answer














            Problem is float precision, so use function numpy.isclose with mask:



            np.random.seed(123)
            df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
            print(df.head(n=1))
            A B C D
            0 -1.085631 0.997345 0.282978 -1.506295

            df = df.mask(np.isclose(df.values, 0.997345))


            Or use numpy.where:



            arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
            df = pd.DataFrame(arr, index=df.index, columns=df.columns)




            print(df.head(n=1))
            A B C D
            0 -1.085631 NaN 0.282978 -1.506295


            EDIT: You can also get only numeric columns by select_dtypes for filtering by subset with :



            np.random.seed(123)
            df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')

            cols = df.select_dtypes(np.number).columns
            df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
            print(df.head(n=1))
            A B C D E
            0 -1.085631 NaN 0.282978 -1.506295 a






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 22 at 9:50

























            answered Nov 22 at 8:38









            jezrael

            315k22254332




            315k22254332












            • Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a string df.iloc[[0], [0]] = 'random_string' and then I try to apply both methods over the whole dataset, they return an error TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
              – ralvarez
              Nov 22 at 9:43












            • Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
              – ralvarez
              Nov 22 at 9:49










            • @ralvarez - Added general solution - filtering only numeric columns and apply solution
              – jezrael
              Nov 22 at 9:53


















            • Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a string df.iloc[[0], [0]] = 'random_string' and then I try to apply both methods over the whole dataset, they return an error TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
              – ralvarez
              Nov 22 at 9:43












            • Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
              – ralvarez
              Nov 22 at 9:49










            • @ralvarez - Added general solution - filtering only numeric columns and apply solution
              – jezrael
              Nov 22 at 9:53
















            Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a string df.iloc[[0], [0]] = 'random_string' and then I try to apply both methods over the whole dataset, they return an error TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
            – ralvarez
            Nov 22 at 9:43






            Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a string df.iloc[[0], [0]] = 'random_string' and then I try to apply both methods over the whole dataset, they return an error TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
            – ralvarez
            Nov 22 at 9:43














            Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
            – ralvarez
            Nov 22 at 9:49




            Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
            – ralvarez
            Nov 22 at 9:49












            @ralvarez - Added general solution - filtering only numeric columns and apply solution
            – jezrael
            Nov 22 at 9:53




            @ralvarez - Added general solution - filtering only numeric columns and apply solution
            – jezrael
            Nov 22 at 9:53












            up vote
            0
            down vote













            Just a another trick for specific indices :



            >>> print(df.head(n=1))
            A B C D
            0 -0.042839 1.701118 0.064779 1.513046

            >>> df['A'][0] = np.nan

            >>> print(df.head(n=1))
            A B C D
            0 NaN 1.701118 0.064779 1.513046





            share|improve this answer

























              up vote
              0
              down vote













              Just a another trick for specific indices :



              >>> print(df.head(n=1))
              A B C D
              0 -0.042839 1.701118 0.064779 1.513046

              >>> df['A'][0] = np.nan

              >>> print(df.head(n=1))
              A B C D
              0 NaN 1.701118 0.064779 1.513046





              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                Just a another trick for specific indices :



                >>> print(df.head(n=1))
                A B C D
                0 -0.042839 1.701118 0.064779 1.513046

                >>> df['A'][0] = np.nan

                >>> print(df.head(n=1))
                A B C D
                0 NaN 1.701118 0.064779 1.513046





                share|improve this answer












                Just a another trick for specific indices :



                >>> print(df.head(n=1))
                A B C D
                0 -0.042839 1.701118 0.064779 1.513046

                >>> df['A'][0] = np.nan

                >>> print(df.head(n=1))
                A B C D
                0 NaN 1.701118 0.064779 1.513046






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 22 at 9:52









                pygo

                1,7361416




                1,7361416






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426787%2fhow-to-replace-a-float-value-with-nan-in-pandas%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                    Calculate evaluation metrics using cross_val_predict sklearn

                    Insert data from modal to MySQL (multiple modal on website)