How to drop rows of Pandas DataFrame whose value in certain columns is NaN












527















I have a DataFrame:



>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN


Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:



                  STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN


How do I do that?










share|improve this question




















  • 19





    dropna: pandas.pydata.org/pandas-docs/stable/generated/…

    – Wouter Overmeire
    Nov 16 '12 at 9:29








  • 134





    df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

    – osa
    Sep 5 '14 at 23:53


















527















I have a DataFrame:



>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN


Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:



                  STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN


How do I do that?










share|improve this question




















  • 19





    dropna: pandas.pydata.org/pandas-docs/stable/generated/…

    – Wouter Overmeire
    Nov 16 '12 at 9:29








  • 134





    df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

    – osa
    Sep 5 '14 at 23:53
















527












527








527


194






I have a DataFrame:



>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN


Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:



                  STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN


How do I do that?










share|improve this question
















I have a DataFrame:



>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN


Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:



                  STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN


How do I do that?







python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 5 '17 at 17:01









Ninjakannon

2,71443051




2,71443051










asked Nov 16 '12 at 9:17









bigbugbigbug

12k296385




12k296385








  • 19





    dropna: pandas.pydata.org/pandas-docs/stable/generated/…

    – Wouter Overmeire
    Nov 16 '12 at 9:29








  • 134





    df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

    – osa
    Sep 5 '14 at 23:53
















  • 19





    dropna: pandas.pydata.org/pandas-docs/stable/generated/…

    – Wouter Overmeire
    Nov 16 '12 at 9:29








  • 134





    df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

    – osa
    Sep 5 '14 at 23:53










19




19





dropna: pandas.pydata.org/pandas-docs/stable/generated/…

– Wouter Overmeire
Nov 16 '12 at 9:29







dropna: pandas.pydata.org/pandas-docs/stable/generated/…

– Wouter Overmeire
Nov 16 '12 at 9:29






134




134





df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

– osa
Sep 5 '14 at 23:53







df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

– osa
Sep 5 '14 at 23:53














12 Answers
12






active

oldest

votes


















418














Don't drop. Just take rows where EPS is finite:



df = df[np.isfinite(df['EPS'])]





share|improve this answer



















  • 384





    I'd recommend using pandas.notnull instead of np.isfinite

    – Wes McKinney
    Nov 21 '12 at 3:08






  • 9





    Is there any advantage to indexing and copying over dropping?

    – Robert Muil
    Jul 31 '15 at 8:15






  • 9





    Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

    – Philipp Schwarz
    Oct 7 '16 at 13:18








  • 3





    @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

    – stormfield
    Sep 7 '17 at 11:53






  • 4





    @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

    – normanius
    Apr 5 '18 at 10:02



















698














This question is already resolved, but...



...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



In [24]: df = pd.DataFrame(np.random.randn(10,3))

In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN




In [27]: df.dropna()     #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295




In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN




In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN




In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN


There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



Pretty handy!






share|improve this answer





















  • 201





    you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

    – James Tobin
    Jun 18 '14 at 14:07






  • 8





    @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

    – osa
    Sep 5 '14 at 23:52






  • 2





    This should be #1

    – Cord Kaldemeyer
    Oct 20 '17 at 13:10






  • 1





    isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

    – TheProletariat
    Mar 20 '18 at 21:51



















90














I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



import pandas as pd
df = df[pd.notnull(df['EPS'])]





share|improve this answer



















  • 7





    Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

    – joris
    Apr 23 '14 at 12:53






  • 2





    notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

    – fantabolous
    Jul 9 '14 at 3:24











  • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

    – Aakash Gupta
    Mar 4 '16 at 6:03



















33














You can use this:



df.dropna(subset=['EPS'], how='all', inplace = True)





share|improve this answer





















  • 10





    how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

    – Anton Protopopov
    Jan 16 '18 at 12:41





















25














Simplest of all solutions:



filtered_df = df[df['EPS'].notnull()]



The above solution is way better than using np.isfinite()







share|improve this answer

































    21














    You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



    In [332]: df[df.EPS.notnull()]
    Out[332]:
    STK_ID RPT_Date STK_ID.1 EPS cash
    2 600016 20111231 600016 4.3 NaN
    4 601939 20111231 601939 2.5 NaN


    In [334]: df[~df.EPS.isnull()]
    Out[334]:
    STK_ID RPT_Date STK_ID.1 EPS cash
    2 600016 20111231 600016 4.3 NaN
    4 601939 20111231 601939 2.5 NaN


    In [347]: df[~np.isnan(df.EPS)]
    Out[347]:
    STK_ID RPT_Date STK_ID.1 EPS cash
    2 600016 20111231 600016 4.3 NaN
    4 601939 20111231 601939 2.5 NaN





    share|improve this answer































      10














      yet another solution which uses the fact that np.nan != np.nan:



      In [149]: df.query("EPS == EPS")
      Out[149]:
      STK_ID EPS cash
      STK_ID RPT_Date
      600016 20111231 600016 4.3 NaN
      601939 20111231 601939 2.5 NaN





      share|improve this answer































        8














        you can use dropna



        Example



        Drop the rows where at least one element is missing.



        df=df.dropna()


        Define in which columns to look for missing values.



        df=df.dropna(subset=['column1', 'column1'])


        See this for more examples




        Note: axis parameter of dropna is deprecated since version 0.23.0:







        share|improve this answer































          6














          Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



          df=df[~df['EPS'].isnull()]


          Now:



          print(df)


          Is:



                           STK_ID  EPS  cash
          STK_ID RPT_Date
          600016 20111231 600016 4.3 NaN
          601939 20111231 601939 2.5 NaN





          share|improve this answer































            1














            It may be added at that '&' can be used to add additional conditions e.g.



            df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


            Notice that when evaluating the statements, pandas needs parenthesis.






            share|improve this answer


























            • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

              – jezrael
              Mar 16 '16 at 11:52





















            0














            Simple and easy way



            df.dropna(subset=['EPS'],inplace=True)



            source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html






            share|improve this answer

































              -1














              For some reason none of the previously submitted answers worked for me. This basic solution did:



              df = df[df.EPS >= 0]


              Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



              df = df[df.EPS <= 0]





              share|improve this answer
























                protected by jezrael Mar 16 '16 at 11:53



                Thank you for your interest in this question.
                Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                Would you like to answer one of these unanswered questions instead?














                12 Answers
                12






                active

                oldest

                votes








                12 Answers
                12






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                418














                Don't drop. Just take rows where EPS is finite:



                df = df[np.isfinite(df['EPS'])]





                share|improve this answer



















                • 384





                  I'd recommend using pandas.notnull instead of np.isfinite

                  – Wes McKinney
                  Nov 21 '12 at 3:08






                • 9





                  Is there any advantage to indexing and copying over dropping?

                  – Robert Muil
                  Jul 31 '15 at 8:15






                • 9





                  Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

                  – Philipp Schwarz
                  Oct 7 '16 at 13:18








                • 3





                  @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

                  – stormfield
                  Sep 7 '17 at 11:53






                • 4





                  @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

                  – normanius
                  Apr 5 '18 at 10:02
















                418














                Don't drop. Just take rows where EPS is finite:



                df = df[np.isfinite(df['EPS'])]





                share|improve this answer



















                • 384





                  I'd recommend using pandas.notnull instead of np.isfinite

                  – Wes McKinney
                  Nov 21 '12 at 3:08






                • 9





                  Is there any advantage to indexing and copying over dropping?

                  – Robert Muil
                  Jul 31 '15 at 8:15






                • 9





                  Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

                  – Philipp Schwarz
                  Oct 7 '16 at 13:18








                • 3





                  @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

                  – stormfield
                  Sep 7 '17 at 11:53






                • 4





                  @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

                  – normanius
                  Apr 5 '18 at 10:02














                418












                418








                418







                Don't drop. Just take rows where EPS is finite:



                df = df[np.isfinite(df['EPS'])]





                share|improve this answer













                Don't drop. Just take rows where EPS is finite:



                df = df[np.isfinite(df['EPS'])]






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 16 '12 at 9:34









                eumiroeumiro

                131k19232230




                131k19232230








                • 384





                  I'd recommend using pandas.notnull instead of np.isfinite

                  – Wes McKinney
                  Nov 21 '12 at 3:08






                • 9





                  Is there any advantage to indexing and copying over dropping?

                  – Robert Muil
                  Jul 31 '15 at 8:15






                • 9





                  Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

                  – Philipp Schwarz
                  Oct 7 '16 at 13:18








                • 3





                  @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

                  – stormfield
                  Sep 7 '17 at 11:53






                • 4





                  @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

                  – normanius
                  Apr 5 '18 at 10:02














                • 384





                  I'd recommend using pandas.notnull instead of np.isfinite

                  – Wes McKinney
                  Nov 21 '12 at 3:08






                • 9





                  Is there any advantage to indexing and copying over dropping?

                  – Robert Muil
                  Jul 31 '15 at 8:15






                • 9





                  Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

                  – Philipp Schwarz
                  Oct 7 '16 at 13:18








                • 3





                  @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

                  – stormfield
                  Sep 7 '17 at 11:53






                • 4





                  @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

                  – normanius
                  Apr 5 '18 at 10:02








                384




                384





                I'd recommend using pandas.notnull instead of np.isfinite

                – Wes McKinney
                Nov 21 '12 at 3:08





                I'd recommend using pandas.notnull instead of np.isfinite

                – Wes McKinney
                Nov 21 '12 at 3:08




                9




                9





                Is there any advantage to indexing and copying over dropping?

                – Robert Muil
                Jul 31 '15 at 8:15





                Is there any advantage to indexing and copying over dropping?

                – Robert Muil
                Jul 31 '15 at 8:15




                9




                9





                Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

                – Philipp Schwarz
                Oct 7 '16 at 13:18







                Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

                – Philipp Schwarz
                Oct 7 '16 at 13:18






                3




                3





                @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

                – stormfield
                Sep 7 '17 at 11:53





                @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

                – stormfield
                Sep 7 '17 at 11:53




                4




                4





                @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

                – normanius
                Apr 5 '18 at 10:02





                @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

                – normanius
                Apr 5 '18 at 10:02













                698














                This question is already resolved, but...



                ...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



                In [24]: df = pd.DataFrame(np.random.randn(10,3))

                In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

                In [26]: df
                Out[26]:
                0 1 2
                0 NaN NaN NaN
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                4 NaN NaN 0.050742
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                8 NaN NaN 0.637482
                9 -0.310130 0.078891 NaN




                In [27]: df.dropna()     #drop all rows that have any NaN values
                Out[27]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                5 -1.250970 0.030561 -2.678622
                7 0.049896 -0.308003 0.823295




                In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
                Out[28]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                4 NaN NaN 0.050742
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                8 NaN NaN 0.637482
                9 -0.310130 0.078891 NaN




                In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
                Out[29]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                5 -1.250970 0.030561 -2.678622
                7 0.049896 -0.308003 0.823295
                9 -0.310130 0.078891 NaN




                In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
                Out[30]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                9 -0.310130 0.078891 NaN


                There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



                Pretty handy!






                share|improve this answer





















                • 201





                  you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

                  – James Tobin
                  Jun 18 '14 at 14:07






                • 8





                  @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

                  – osa
                  Sep 5 '14 at 23:52






                • 2





                  This should be #1

                  – Cord Kaldemeyer
                  Oct 20 '17 at 13:10






                • 1





                  isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

                  – TheProletariat
                  Mar 20 '18 at 21:51
















                698














                This question is already resolved, but...



                ...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



                In [24]: df = pd.DataFrame(np.random.randn(10,3))

                In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

                In [26]: df
                Out[26]:
                0 1 2
                0 NaN NaN NaN
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                4 NaN NaN 0.050742
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                8 NaN NaN 0.637482
                9 -0.310130 0.078891 NaN




                In [27]: df.dropna()     #drop all rows that have any NaN values
                Out[27]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                5 -1.250970 0.030561 -2.678622
                7 0.049896 -0.308003 0.823295




                In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
                Out[28]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                4 NaN NaN 0.050742
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                8 NaN NaN 0.637482
                9 -0.310130 0.078891 NaN




                In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
                Out[29]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                5 -1.250970 0.030561 -2.678622
                7 0.049896 -0.308003 0.823295
                9 -0.310130 0.078891 NaN




                In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
                Out[30]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                9 -0.310130 0.078891 NaN


                There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



                Pretty handy!






                share|improve this answer





















                • 201





                  you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

                  – James Tobin
                  Jun 18 '14 at 14:07






                • 8





                  @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

                  – osa
                  Sep 5 '14 at 23:52






                • 2





                  This should be #1

                  – Cord Kaldemeyer
                  Oct 20 '17 at 13:10






                • 1





                  isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

                  – TheProletariat
                  Mar 20 '18 at 21:51














                698












                698








                698







                This question is already resolved, but...



                ...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



                In [24]: df = pd.DataFrame(np.random.randn(10,3))

                In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

                In [26]: df
                Out[26]:
                0 1 2
                0 NaN NaN NaN
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                4 NaN NaN 0.050742
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                8 NaN NaN 0.637482
                9 -0.310130 0.078891 NaN




                In [27]: df.dropna()     #drop all rows that have any NaN values
                Out[27]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                5 -1.250970 0.030561 -2.678622
                7 0.049896 -0.308003 0.823295




                In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
                Out[28]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                4 NaN NaN 0.050742
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                8 NaN NaN 0.637482
                9 -0.310130 0.078891 NaN




                In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
                Out[29]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                5 -1.250970 0.030561 -2.678622
                7 0.049896 -0.308003 0.823295
                9 -0.310130 0.078891 NaN




                In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
                Out[30]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                9 -0.310130 0.078891 NaN


                There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



                Pretty handy!






                share|improve this answer















                This question is already resolved, but...



                ...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



                In [24]: df = pd.DataFrame(np.random.randn(10,3))

                In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

                In [26]: df
                Out[26]:
                0 1 2
                0 NaN NaN NaN
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                4 NaN NaN 0.050742
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                8 NaN NaN 0.637482
                9 -0.310130 0.078891 NaN




                In [27]: df.dropna()     #drop all rows that have any NaN values
                Out[27]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                5 -1.250970 0.030561 -2.678622
                7 0.049896 -0.308003 0.823295




                In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
                Out[28]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                4 NaN NaN 0.050742
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                8 NaN NaN 0.637482
                9 -0.310130 0.078891 NaN




                In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
                Out[29]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                5 -1.250970 0.030561 -2.678622
                7 0.049896 -0.308003 0.823295
                9 -0.310130 0.078891 NaN




                In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
                Out[30]:
                0 1 2
                1 2.677677 -1.466923 -0.750366
                2 NaN 0.798002 -0.906038
                3 0.672201 0.964789 NaN
                5 -1.250970 0.030561 -2.678622
                6 NaN 1.036043 NaN
                7 0.049896 -0.308003 0.823295
                9 -0.310130 0.078891 NaN


                There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



                Pretty handy!







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Aug 14 '17 at 0:04









                ayhan

                38k671106




                38k671106










                answered Nov 17 '12 at 20:27









                AmanAman

                25.3k62535




                25.3k62535








                • 201





                  you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

                  – James Tobin
                  Jun 18 '14 at 14:07






                • 8





                  @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

                  – osa
                  Sep 5 '14 at 23:52






                • 2





                  This should be #1

                  – Cord Kaldemeyer
                  Oct 20 '17 at 13:10






                • 1





                  isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

                  – TheProletariat
                  Mar 20 '18 at 21:51














                • 201





                  you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

                  – James Tobin
                  Jun 18 '14 at 14:07






                • 8





                  @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

                  – osa
                  Sep 5 '14 at 23:52






                • 2





                  This should be #1

                  – Cord Kaldemeyer
                  Oct 20 '17 at 13:10






                • 1





                  isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

                  – TheProletariat
                  Mar 20 '18 at 21:51








                201




                201





                you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

                – James Tobin
                Jun 18 '14 at 14:07





                you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

                – James Tobin
                Jun 18 '14 at 14:07




                8




                8





                @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

                – osa
                Sep 5 '14 at 23:52





                @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

                – osa
                Sep 5 '14 at 23:52




                2




                2





                This should be #1

                – Cord Kaldemeyer
                Oct 20 '17 at 13:10





                This should be #1

                – Cord Kaldemeyer
                Oct 20 '17 at 13:10




                1




                1





                isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

                – TheProletariat
                Mar 20 '18 at 21:51





                isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

                – TheProletariat
                Mar 20 '18 at 21:51











                90














                I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



                import pandas as pd
                df = df[pd.notnull(df['EPS'])]





                share|improve this answer



















                • 7





                  Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

                  – joris
                  Apr 23 '14 at 12:53






                • 2





                  notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

                  – fantabolous
                  Jul 9 '14 at 3:24











                • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

                  – Aakash Gupta
                  Mar 4 '16 at 6:03
















                90














                I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



                import pandas as pd
                df = df[pd.notnull(df['EPS'])]





                share|improve this answer



















                • 7





                  Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

                  – joris
                  Apr 23 '14 at 12:53






                • 2





                  notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

                  – fantabolous
                  Jul 9 '14 at 3:24











                • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

                  – Aakash Gupta
                  Mar 4 '16 at 6:03














                90












                90








                90







                I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



                import pandas as pd
                df = df[pd.notnull(df['EPS'])]





                share|improve this answer













                I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



                import pandas as pd
                df = df[pd.notnull(df['EPS'])]






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 23 '14 at 5:37









                Kirk HadleyKirk Hadley

                1,04672




                1,04672








                • 7





                  Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

                  – joris
                  Apr 23 '14 at 12:53






                • 2





                  notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

                  – fantabolous
                  Jul 9 '14 at 3:24











                • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

                  – Aakash Gupta
                  Mar 4 '16 at 6:03














                • 7





                  Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

                  – joris
                  Apr 23 '14 at 12:53






                • 2





                  notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

                  – fantabolous
                  Jul 9 '14 at 3:24











                • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

                  – Aakash Gupta
                  Mar 4 '16 at 6:03








                7




                7





                Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

                – joris
                Apr 23 '14 at 12:53





                Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

                – joris
                Apr 23 '14 at 12:53




                2




                2





                notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

                – fantabolous
                Jul 9 '14 at 3:24





                notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

                – fantabolous
                Jul 9 '14 at 3:24













                This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

                – Aakash Gupta
                Mar 4 '16 at 6:03





                This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

                – Aakash Gupta
                Mar 4 '16 at 6:03











                33














                You can use this:



                df.dropna(subset=['EPS'], how='all', inplace = True)





                share|improve this answer





















                • 10





                  how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

                  – Anton Protopopov
                  Jan 16 '18 at 12:41


















                33














                You can use this:



                df.dropna(subset=['EPS'], how='all', inplace = True)





                share|improve this answer





















                • 10





                  how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

                  – Anton Protopopov
                  Jan 16 '18 at 12:41
















                33












                33








                33







                You can use this:



                df.dropna(subset=['EPS'], how='all', inplace = True)





                share|improve this answer















                You can use this:



                df.dropna(subset=['EPS'], how='all', inplace = True)






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Aug 21 '17 at 9:49









                Mojtaba Khodadadi

                58457




                58457










                answered Aug 2 '17 at 16:28









                JoeJoe

                6,10421530




                6,10421530








                • 10





                  how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

                  – Anton Protopopov
                  Jan 16 '18 at 12:41
















                • 10





                  how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

                  – Anton Protopopov
                  Jan 16 '18 at 12:41










                10




                10





                how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

                – Anton Protopopov
                Jan 16 '18 at 12:41







                how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

                – Anton Protopopov
                Jan 16 '18 at 12:41













                25














                Simplest of all solutions:



                filtered_df = df[df['EPS'].notnull()]



                The above solution is way better than using np.isfinite()







                share|improve this answer






























                  25














                  Simplest of all solutions:



                  filtered_df = df[df['EPS'].notnull()]



                  The above solution is way better than using np.isfinite()







                  share|improve this answer




























                    25












                    25








                    25







                    Simplest of all solutions:



                    filtered_df = df[df['EPS'].notnull()]



                    The above solution is way better than using np.isfinite()







                    share|improve this answer















                    Simplest of all solutions:



                    filtered_df = df[df['EPS'].notnull()]



                    The above solution is way better than using np.isfinite()








                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Aug 8 '18 at 15:17









                    ayhan

                    38k671106




                    38k671106










                    answered Nov 23 '17 at 12:08









                    Gil BaggioGil Baggio

                    2,70711822




                    2,70711822























                        21














                        You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



                        In [332]: df[df.EPS.notnull()]
                        Out[332]:
                        STK_ID RPT_Date STK_ID.1 EPS cash
                        2 600016 20111231 600016 4.3 NaN
                        4 601939 20111231 601939 2.5 NaN


                        In [334]: df[~df.EPS.isnull()]
                        Out[334]:
                        STK_ID RPT_Date STK_ID.1 EPS cash
                        2 600016 20111231 600016 4.3 NaN
                        4 601939 20111231 601939 2.5 NaN


                        In [347]: df[~np.isnan(df.EPS)]
                        Out[347]:
                        STK_ID RPT_Date STK_ID.1 EPS cash
                        2 600016 20111231 600016 4.3 NaN
                        4 601939 20111231 601939 2.5 NaN





                        share|improve this answer




























                          21














                          You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



                          In [332]: df[df.EPS.notnull()]
                          Out[332]:
                          STK_ID RPT_Date STK_ID.1 EPS cash
                          2 600016 20111231 600016 4.3 NaN
                          4 601939 20111231 601939 2.5 NaN


                          In [334]: df[~df.EPS.isnull()]
                          Out[334]:
                          STK_ID RPT_Date STK_ID.1 EPS cash
                          2 600016 20111231 600016 4.3 NaN
                          4 601939 20111231 601939 2.5 NaN


                          In [347]: df[~np.isnan(df.EPS)]
                          Out[347]:
                          STK_ID RPT_Date STK_ID.1 EPS cash
                          2 600016 20111231 600016 4.3 NaN
                          4 601939 20111231 601939 2.5 NaN





                          share|improve this answer


























                            21












                            21








                            21







                            You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



                            In [332]: df[df.EPS.notnull()]
                            Out[332]:
                            STK_ID RPT_Date STK_ID.1 EPS cash
                            2 600016 20111231 600016 4.3 NaN
                            4 601939 20111231 601939 2.5 NaN


                            In [334]: df[~df.EPS.isnull()]
                            Out[334]:
                            STK_ID RPT_Date STK_ID.1 EPS cash
                            2 600016 20111231 600016 4.3 NaN
                            4 601939 20111231 601939 2.5 NaN


                            In [347]: df[~np.isnan(df.EPS)]
                            Out[347]:
                            STK_ID RPT_Date STK_ID.1 EPS cash
                            2 600016 20111231 600016 4.3 NaN
                            4 601939 20111231 601939 2.5 NaN





                            share|improve this answer













                            You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



                            In [332]: df[df.EPS.notnull()]
                            Out[332]:
                            STK_ID RPT_Date STK_ID.1 EPS cash
                            2 600016 20111231 600016 4.3 NaN
                            4 601939 20111231 601939 2.5 NaN


                            In [334]: df[~df.EPS.isnull()]
                            Out[334]:
                            STK_ID RPT_Date STK_ID.1 EPS cash
                            2 600016 20111231 600016 4.3 NaN
                            4 601939 20111231 601939 2.5 NaN


                            In [347]: df[~np.isnan(df.EPS)]
                            Out[347]:
                            STK_ID RPT_Date STK_ID.1 EPS cash
                            2 600016 20111231 600016 4.3 NaN
                            4 601939 20111231 601939 2.5 NaN






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Dec 4 '15 at 7:01









                            Anton ProtopopovAnton Protopopov

                            15.4k34960




                            15.4k34960























                                10














                                yet another solution which uses the fact that np.nan != np.nan:



                                In [149]: df.query("EPS == EPS")
                                Out[149]:
                                STK_ID EPS cash
                                STK_ID RPT_Date
                                600016 20111231 600016 4.3 NaN
                                601939 20111231 601939 2.5 NaN





                                share|improve this answer




























                                  10














                                  yet another solution which uses the fact that np.nan != np.nan:



                                  In [149]: df.query("EPS == EPS")
                                  Out[149]:
                                  STK_ID EPS cash
                                  STK_ID RPT_Date
                                  600016 20111231 600016 4.3 NaN
                                  601939 20111231 601939 2.5 NaN





                                  share|improve this answer


























                                    10












                                    10








                                    10







                                    yet another solution which uses the fact that np.nan != np.nan:



                                    In [149]: df.query("EPS == EPS")
                                    Out[149]:
                                    STK_ID EPS cash
                                    STK_ID RPT_Date
                                    600016 20111231 600016 4.3 NaN
                                    601939 20111231 601939 2.5 NaN





                                    share|improve this answer













                                    yet another solution which uses the fact that np.nan != np.nan:



                                    In [149]: df.query("EPS == EPS")
                                    Out[149]:
                                    STK_ID EPS cash
                                    STK_ID RPT_Date
                                    600016 20111231 600016 4.3 NaN
                                    601939 20111231 601939 2.5 NaN






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Apr 20 '17 at 21:15









                                    MaxUMaxU

                                    124k12125182




                                    124k12125182























                                        8














                                        you can use dropna



                                        Example



                                        Drop the rows where at least one element is missing.



                                        df=df.dropna()


                                        Define in which columns to look for missing values.



                                        df=df.dropna(subset=['column1', 'column1'])


                                        See this for more examples




                                        Note: axis parameter of dropna is deprecated since version 0.23.0:







                                        share|improve this answer




























                                          8














                                          you can use dropna



                                          Example



                                          Drop the rows where at least one element is missing.



                                          df=df.dropna()


                                          Define in which columns to look for missing values.



                                          df=df.dropna(subset=['column1', 'column1'])


                                          See this for more examples




                                          Note: axis parameter of dropna is deprecated since version 0.23.0:







                                          share|improve this answer


























                                            8












                                            8








                                            8







                                            you can use dropna



                                            Example



                                            Drop the rows where at least one element is missing.



                                            df=df.dropna()


                                            Define in which columns to look for missing values.



                                            df=df.dropna(subset=['column1', 'column1'])


                                            See this for more examples




                                            Note: axis parameter of dropna is deprecated since version 0.23.0:







                                            share|improve this answer













                                            you can use dropna



                                            Example



                                            Drop the rows where at least one element is missing.



                                            df=df.dropna()


                                            Define in which columns to look for missing values.



                                            df=df.dropna(subset=['column1', 'column1'])


                                            See this for more examples




                                            Note: axis parameter of dropna is deprecated since version 0.23.0:








                                            share|improve this answer












                                            share|improve this answer



                                            share|improve this answer










                                            answered Oct 14 '18 at 19:26









                                            UmerUmer

                                            753716




                                            753716























                                                6














                                                Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



                                                df=df[~df['EPS'].isnull()]


                                                Now:



                                                print(df)


                                                Is:



                                                                 STK_ID  EPS  cash
                                                STK_ID RPT_Date
                                                600016 20111231 600016 4.3 NaN
                                                601939 20111231 601939 2.5 NaN





                                                share|improve this answer




























                                                  6














                                                  Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



                                                  df=df[~df['EPS'].isnull()]


                                                  Now:



                                                  print(df)


                                                  Is:



                                                                   STK_ID  EPS  cash
                                                  STK_ID RPT_Date
                                                  600016 20111231 600016 4.3 NaN
                                                  601939 20111231 601939 2.5 NaN





                                                  share|improve this answer


























                                                    6












                                                    6








                                                    6







                                                    Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



                                                    df=df[~df['EPS'].isnull()]


                                                    Now:



                                                    print(df)


                                                    Is:



                                                                     STK_ID  EPS  cash
                                                    STK_ID RPT_Date
                                                    600016 20111231 600016 4.3 NaN
                                                    601939 20111231 601939 2.5 NaN





                                                    share|improve this answer













                                                    Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



                                                    df=df[~df['EPS'].isnull()]


                                                    Now:



                                                    print(df)


                                                    Is:



                                                                     STK_ID  EPS  cash
                                                    STK_ID RPT_Date
                                                    600016 20111231 600016 4.3 NaN
                                                    601939 20111231 601939 2.5 NaN






                                                    share|improve this answer












                                                    share|improve this answer



                                                    share|improve this answer










                                                    answered Oct 18 '18 at 23:55









                                                    U9-ForwardU9-Forward

                                                    16.9k51643




                                                    16.9k51643























                                                        1














                                                        It may be added at that '&' can be used to add additional conditions e.g.



                                                        df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


                                                        Notice that when evaluating the statements, pandas needs parenthesis.






                                                        share|improve this answer


























                                                        • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

                                                          – jezrael
                                                          Mar 16 '16 at 11:52


















                                                        1














                                                        It may be added at that '&' can be used to add additional conditions e.g.



                                                        df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


                                                        Notice that when evaluating the statements, pandas needs parenthesis.






                                                        share|improve this answer


























                                                        • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

                                                          – jezrael
                                                          Mar 16 '16 at 11:52
















                                                        1












                                                        1








                                                        1







                                                        It may be added at that '&' can be used to add additional conditions e.g.



                                                        df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


                                                        Notice that when evaluating the statements, pandas needs parenthesis.






                                                        share|improve this answer















                                                        It may be added at that '&' can be used to add additional conditions e.g.



                                                        df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


                                                        Notice that when evaluating the statements, pandas needs parenthesis.







                                                        share|improve this answer














                                                        share|improve this answer



                                                        share|improve this answer








                                                        edited Jan 26 '17 at 23:12









                                                        aesede

                                                        4,03722831




                                                        4,03722831










                                                        answered Mar 15 '16 at 15:33









                                                        DavidDavid

                                                        191




                                                        191













                                                        • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

                                                          – jezrael
                                                          Mar 16 '16 at 11:52





















                                                        • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

                                                          – jezrael
                                                          Mar 16 '16 at 11:52



















                                                        Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

                                                        – jezrael
                                                        Mar 16 '16 at 11:52







                                                        Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

                                                        – jezrael
                                                        Mar 16 '16 at 11:52













                                                        0














                                                        Simple and easy way



                                                        df.dropna(subset=['EPS'],inplace=True)



                                                        source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html






                                                        share|improve this answer






























                                                          0














                                                          Simple and easy way



                                                          df.dropna(subset=['EPS'],inplace=True)



                                                          source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html






                                                          share|improve this answer




























                                                            0












                                                            0








                                                            0







                                                            Simple and easy way



                                                            df.dropna(subset=['EPS'],inplace=True)



                                                            source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html






                                                            share|improve this answer















                                                            Simple and easy way



                                                            df.dropna(subset=['EPS'],inplace=True)



                                                            source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html







                                                            share|improve this answer














                                                            share|improve this answer



                                                            share|improve this answer








                                                            edited Jan 23 at 10:13







                                                            user10954831

















                                                            answered Jan 22 at 8:26









                                                            NursnaazNursnaaz

                                                            430719




                                                            430719























                                                                -1














                                                                For some reason none of the previously submitted answers worked for me. This basic solution did:



                                                                df = df[df.EPS >= 0]


                                                                Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



                                                                df = df[df.EPS <= 0]





                                                                share|improve this answer






























                                                                  -1














                                                                  For some reason none of the previously submitted answers worked for me. This basic solution did:



                                                                  df = df[df.EPS >= 0]


                                                                  Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



                                                                  df = df[df.EPS <= 0]





                                                                  share|improve this answer




























                                                                    -1












                                                                    -1








                                                                    -1







                                                                    For some reason none of the previously submitted answers worked for me. This basic solution did:



                                                                    df = df[df.EPS >= 0]


                                                                    Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



                                                                    df = df[df.EPS <= 0]





                                                                    share|improve this answer















                                                                    For some reason none of the previously submitted answers worked for me. This basic solution did:



                                                                    df = df[df.EPS >= 0]


                                                                    Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



                                                                    df = df[df.EPS <= 0]






                                                                    share|improve this answer














                                                                    share|improve this answer



                                                                    share|improve this answer








                                                                    edited Oct 9 '15 at 18:25

























                                                                    answered Oct 9 '15 at 18:00









                                                                    samthebrandsamthebrand

                                                                    1,03542344




                                                                    1,03542344

















                                                                        protected by jezrael Mar 16 '16 at 11:53



                                                                        Thank you for your interest in this question.
                                                                        Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                                                        Would you like to answer one of these unanswered questions instead?



                                                                        Popular posts from this blog

                                                                        A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                                                                        Calculate evaluation metrics using cross_val_predict sklearn

                                                                        Insert data from modal to MySQL (multiple modal on website)