How to drop rows of Pandas DataFrame whose value in certain columns is NaN

527

I have a DataFrame:

>>> df

                 STK_ID  EPS  cash

STK_ID RPT_Date                   

601166 20111231  601166  NaN   NaN

600036 20111231  600036  NaN    12

600016 20111231  600016  4.3   NaN

601009 20111231  601009  NaN   NaN

601939 20111231  601939  2.5   NaN

000001 20111231  000001  NaN   NaN

Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:

                  STK_ID  EPS  cash

STK_ID RPT_Date                   

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

How do I do that?

edited Jan 5 '17 at 17:01

Ninjakannon

2,71443051

asked Nov 16 '12 at 9:17

bigbug

12k296385

19

dropna: pandas.pydata.org/pandas-docs/stable/generated/…

– Wouter Overmeire
Nov 16 '12 at 9:29

134

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

– osa
Sep 5 '14 at 23:53

add a comment |

527

I have a DataFrame:

>>> df

                 STK_ID  EPS  cash

STK_ID RPT_Date                   

601166 20111231  601166  NaN   NaN

600036 20111231  600036  NaN    12

600016 20111231  600016  4.3   NaN

601009 20111231  601009  NaN   NaN

601939 20111231  601939  2.5   NaN

000001 20111231  000001  NaN   NaN

Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:

                  STK_ID  EPS  cash

STK_ID RPT_Date                   

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

How do I do that?

edited Jan 5 '17 at 17:01

Ninjakannon

2,71443051

asked Nov 16 '12 at 9:17

bigbug

12k296385

19

dropna: pandas.pydata.org/pandas-docs/stable/generated/…

– Wouter Overmeire
Nov 16 '12 at 9:29

134

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

– osa
Sep 5 '14 at 23:53

add a comment |

527

194

I have a DataFrame:

>>> df

                 STK_ID  EPS  cash

STK_ID RPT_Date                   

601166 20111231  601166  NaN   NaN

600036 20111231  600036  NaN    12

600016 20111231  600016  4.3   NaN

601009 20111231  601009  NaN   NaN

601939 20111231  601939  2.5   NaN

000001 20111231  000001  NaN   NaN

Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:

                  STK_ID  EPS  cash

STK_ID RPT_Date                   

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

How do I do that?

edited Jan 5 '17 at 17:01

Ninjakannon

2,71443051

asked Nov 16 '12 at 9:17

bigbug

12k296385

I have a DataFrame:

>>> df

                 STK_ID  EPS  cash

STK_ID RPT_Date                   

601166 20111231  601166  NaN   NaN

600036 20111231  600036  NaN    12

600016 20111231  600016  4.3   NaN

601009 20111231  601009  NaN   NaN

601939 20111231  601939  2.5   NaN

000001 20111231  000001  NaN   NaN

Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:

                  STK_ID  EPS  cash

STK_ID RPT_Date                   

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

How do I do that?

python pandas dataframe

edited Jan 5 '17 at 17:01

Ninjakannon

2,71443051

asked Nov 16 '12 at 9:17

bigbug

12k296385

edited Jan 5 '17 at 17:01

Ninjakannon

2,71443051

asked Nov 16 '12 at 9:17

bigbug

12k296385

edited Jan 5 '17 at 17:01

Ninjakannon

2,71443051

edited Jan 5 '17 at 17:01

Ninjakannon

2,71443051

edited Jan 5 '17 at 17:01

Ninjakannon

2,71443051

asked Nov 16 '12 at 9:17

bigbug

12k296385

asked Nov 16 '12 at 9:17

bigbug

12k296385

asked Nov 16 '12 at 9:17

bigbug

12k296385

19

dropna: pandas.pydata.org/pandas-docs/stable/generated/…

– Wouter Overmeire
Nov 16 '12 at 9:29

134

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

– osa
Sep 5 '14 at 23:53

add a comment |

19

dropna: pandas.pydata.org/pandas-docs/stable/generated/…

– Wouter Overmeire
Nov 16 '12 at 9:29

134

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

– osa
Sep 5 '14 at 23:53

dropna: pandas.pydata.org/pandas-docs/stable/generated/…

– Wouter Overmeire
Nov 16 '12 at 9:29

134

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])

– osa
Sep 5 '14 at 23:53

add a comment |

12 Answers
12

active

oldest

votes

418

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

131k19232230

384

I'd recommend using pandas.notnull instead of np.isfinite

– Wes McKinney
Nov 21 '12 at 3:08

9

Is there any advantage to indexing and copying over dropping?

– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

– Philipp Schwarz
Oct 7 '16 at 13:18

3

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

– normanius
Apr 5 '18 at 10:02

|
show 2 more comments

698

This question is already resolved, but...

...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

38k671106

answered Nov 17 '12 at 20:27

Aman

25.3k62535

201

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

– osa
Sep 5 '14 at 23:52

2

This should be #1

– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

– TheProletariat
Mar 20 '18 at 21:51

add a comment |

I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,04672

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

58457

answered Aug 2 '17 at 16:28

Joe

6,10421530

10

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

– Anton Protopopov
Jan 16 '18 at 12:41

add a comment |

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 '18 at 15:17

ayhan

38k671106

answered Nov 23 '17 at 12:08

Gil Baggio

2,70711822

add a comment |

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

15.4k34960

add a comment |

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

124k12125182

add a comment |

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 '18 at 19:26

Umer

753716

add a comment |

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 '18 at 23:55

U9-Forward

16.9k51643

add a comment |

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

4,03722831

answered Mar 15 '16 at 15:33

David

191

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

– jezrael
Mar 16 '16 at 11:52

add a comment |

Simple and easy way

df.dropna(subset=['EPS'],inplace=True)

source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

edited Jan 23 at 10:13

user10954831

answered Jan 22 at 8:26

Nursnaaz

430719

add a comment |

-1

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

1,03542344

add a comment |

protected by jezrael Mar 16 '16 at 11:53

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

12 Answers
12

active

oldest

votes

12 Answers
12

active

oldest

votes

418

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

131k19232230

384

I'd recommend using pandas.notnull instead of np.isfinite

– Wes McKinney
Nov 21 '12 at 3:08

9

Is there any advantage to indexing and copying over dropping?

– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

– Philipp Schwarz
Oct 7 '16 at 13:18

3

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

– normanius
Apr 5 '18 at 10:02

|
show 2 more comments

418

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

131k19232230

384

I'd recommend using pandas.notnull instead of np.isfinite

– Wes McKinney
Nov 21 '12 at 3:08

9

Is there any advantage to indexing and copying over dropping?

– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

– Philipp Schwarz
Oct 7 '16 at 13:18

3

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

– normanius
Apr 5 '18 at 10:02

|
show 2 more comments

418

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

131k19232230

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

131k19232230

answered Nov 16 '12 at 9:34

eumiro

131k19232230

answered Nov 16 '12 at 9:34

eumiro

131k19232230

answered Nov 16 '12 at 9:34

eumiro

131k19232230

384

I'd recommend using pandas.notnull instead of np.isfinite

– Wes McKinney
Nov 21 '12 at 3:08

9

Is there any advantage to indexing and copying over dropping?

– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

– Philipp Schwarz
Oct 7 '16 at 13:18

3

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

– normanius
Apr 5 '18 at 10:02

|
show 2 more comments

384

I'd recommend using pandas.notnull instead of np.isfinite

– Wes McKinney
Nov 21 '12 at 3:08

9

Is there any advantage to indexing and copying over dropping?

– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

– Philipp Schwarz
Oct 7 '16 at 13:18

3

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

– normanius
Apr 5 '18 at 10:02

384

I'd recommend using pandas.notnull instead of np.isfinite

– Wes McKinney
Nov 21 '12 at 3:08

Is there any advantage to indexing and copying over dropping?

– Robert Muil
Jul 31 '15 at 8:15

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

– Philipp Schwarz
Oct 7 '16 at 13:18

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?

– stormfield
Sep 7 '17 at 11:53

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.

– normanius
Apr 5 '18 at 10:02

|
show 2 more comments

698

This question is already resolved, but...

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

38k671106

answered Nov 17 '12 at 20:27

Aman

25.3k62535

201

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

– osa
Sep 5 '14 at 23:52

2

This should be #1

– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

– TheProletariat
Mar 20 '18 at 21:51

add a comment |

698

This question is already resolved, but...

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

38k671106

answered Nov 17 '12 at 20:27

Aman

25.3k62535

201

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

– osa
Sep 5 '14 at 23:52

2

This should be #1

– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

– TheProletariat
Mar 20 '18 at 21:51

add a comment |

698

This question is already resolved, but...

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

38k671106

answered Nov 17 '12 at 20:27

Aman

25.3k62535

This question is already resolved, but...

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

38k671106

answered Nov 17 '12 at 20:27

Aman

25.3k62535

edited Aug 14 '17 at 0:04

ayhan

38k671106

edited Aug 14 '17 at 0:04

ayhan

38k671106

edited Aug 14 '17 at 0:04

ayhan

38k671106

answered Nov 17 '12 at 20:27

Aman

25.3k62535

answered Nov 17 '12 at 20:27

Aman

25.3k62535

answered Nov 17 '12 at 20:27

Aman

25.3k62535

201

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

– osa
Sep 5 '14 at 23:52

2

This should be #1

– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

– TheProletariat
Mar 20 '18 at 21:51

add a comment |

201

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

– osa
Sep 5 '14 at 23:52

2

This should be #1

– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

– TheProletariat
Mar 20 '18 at 21:51

201

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1

– James Tobin
Jun 18 '14 at 14:07

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...

– osa
Sep 5 '14 at 23:52

This should be #1

– Cord Kaldemeyer
Oct 20 '17 at 13:10

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.

– TheProletariat
Mar 20 '18 at 21:51

add a comment |

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,04672

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,04672

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,04672

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,04672

answered Apr 23 '14 at 5:37

Kirk Hadley

1,04672

answered Apr 23 '14 at 5:37

Kirk Hadley

1,04672

answered Apr 23 '14 at 5:37

Kirk Hadley

1,04672

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

– Aakash Gupta
Mar 4 '16 at 6:03

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)

– joris
Apr 23 '14 at 12:53

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.

– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it

– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

58457

answered Aug 2 '17 at 16:28

Joe

6,10421530

10

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

– Anton Protopopov
Jan 16 '18 at 12:41

add a comment |

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

58457

answered Aug 2 '17 at 16:28

Joe

6,10421530

10

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

– Anton Protopopov
Jan 16 '18 at 12:41

add a comment |

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

58457

answered Aug 2 '17 at 16:28

Joe

6,10421530

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

58457

answered Aug 2 '17 at 16:28

Joe

6,10421530

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

58457

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

58457

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

58457

answered Aug 2 '17 at 16:28

Joe

6,10421530

answered Aug 2 '17 at 16:28

Joe

6,10421530

answered Aug 2 '17 at 16:28

Joe

6,10421530

10

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

– Anton Protopopov
Jan 16 '18 at 12:41

add a comment |

10

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

– Anton Protopopov
Jan 16 '18 at 12:41

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

– Anton Protopopov
Jan 16 '18 at 12:41

add a comment |

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 '18 at 15:17

ayhan

38k671106

answered Nov 23 '17 at 12:08

Gil Baggio

2,70711822

add a comment |

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 '18 at 15:17

ayhan

38k671106

answered Nov 23 '17 at 12:08

Gil Baggio

2,70711822

add a comment |

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 '18 at 15:17

ayhan

38k671106

answered Nov 23 '17 at 12:08

Gil Baggio

2,70711822

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 '18 at 15:17

ayhan

38k671106

answered Nov 23 '17 at 12:08

Gil Baggio

2,70711822

edited Aug 8 '18 at 15:17

ayhan

38k671106

edited Aug 8 '18 at 15:17

ayhan

38k671106

edited Aug 8 '18 at 15:17

ayhan

38k671106

answered Nov 23 '17 at 12:08

Gil Baggio

2,70711822

answered Nov 23 '17 at 12:08

Gil Baggio

2,70711822

answered Nov 23 '17 at 12:08

Gil Baggio

2,70711822

add a comment |

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

15.4k34960

add a comment |

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

15.4k34960

add a comment |

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

15.4k34960

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

15.4k34960

answered Dec 4 '15 at 7:01

Anton Protopopov

15.4k34960

answered Dec 4 '15 at 7:01

Anton Protopopov

15.4k34960

answered Dec 4 '15 at 7:01

Anton Protopopov

15.4k34960

add a comment |

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

124k12125182

add a comment |

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

124k12125182

add a comment |

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

124k12125182

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

124k12125182

answered Apr 20 '17 at 21:15

MaxU

124k12125182

answered Apr 20 '17 at 21:15

MaxU

124k12125182

answered Apr 20 '17 at 21:15

MaxU

124k12125182

add a comment |

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 '18 at 19:26

Umer

753716

add a comment |

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 '18 at 19:26

Umer

753716

add a comment |

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 '18 at 19:26

Umer

753716

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 '18 at 19:26

Umer

753716

answered Oct 14 '18 at 19:26

Umer

753716

answered Oct 14 '18 at 19:26

Umer

753716

answered Oct 14 '18 at 19:26

Umer

753716

add a comment |

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 '18 at 23:55

U9-Forward

16.9k51643

add a comment |

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 '18 at 23:55

U9-Forward

16.9k51643

add a comment |

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 '18 at 23:55

U9-Forward

16.9k51643

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 '18 at 23:55

U9-Forward

16.9k51643

answered Oct 18 '18 at 23:55

U9-Forward

16.9k51643

answered Oct 18 '18 at 23:55

U9-Forward

16.9k51643

answered Oct 18 '18 at 23:55

U9-Forward

16.9k51643

add a comment |

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

4,03722831

answered Mar 15 '16 at 15:33

David

191

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

– jezrael
Mar 16 '16 at 11:52

add a comment |

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

4,03722831

answered Mar 15 '16 at 15:33

David

191

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

– jezrael
Mar 16 '16 at 11:52

add a comment |

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

4,03722831

answered Mar 15 '16 at 15:33

David

191

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

4,03722831

answered Mar 15 '16 at 15:33

David

191

edited Jan 26 '17 at 23:12

aesede

4,03722831

edited Jan 26 '17 at 23:12

aesede

4,03722831

edited Jan 26 '17 at 23:12

aesede

4,03722831

answered Mar 15 '16 at 15:33

David

191

answered Mar 15 '16 at 15:33

David

191

answered Mar 15 '16 at 15:33

David

191

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

– jezrael
Mar 16 '16 at 11:52

add a comment |

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

– jezrael
Mar 16 '16 at 11:52

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.

– jezrael
Mar 16 '16 at 11:52

add a comment |

Simple and easy way

df.dropna(subset=['EPS'],inplace=True)

source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

edited Jan 23 at 10:13

user10954831

answered Jan 22 at 8:26

Nursnaaz

430719

add a comment |

Simple and easy way

df.dropna(subset=['EPS'],inplace=True)

source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

edited Jan 23 at 10:13

user10954831

answered Jan 22 at 8:26

Nursnaaz

430719

add a comment |

Simple and easy way

df.dropna(subset=['EPS'],inplace=True)

source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

edited Jan 23 at 10:13

user10954831

answered Jan 22 at 8:26

Nursnaaz

430719

Simple and easy way

df.dropna(subset=['EPS'],inplace=True)

source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

edited Jan 23 at 10:13

user10954831

answered Jan 22 at 8:26

Nursnaaz

430719

edited Jan 23 at 10:13

user10954831

edited Jan 23 at 10:13

user10954831

edited Jan 23 at 10:13

user10954831

answered Jan 22 at 8:26

Nursnaaz

430719

answered Jan 22 at 8:26

Nursnaaz

430719

answered Jan 22 at 8:26

Nursnaaz

430719

add a comment |

-1

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

1,03542344

add a comment |

-1

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

1,03542344

add a comment |

-1

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

1,03542344

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

1,03542344

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

1,03542344

answered Oct 9 '15 at 18:00

samthebrand

1,03542344

answered Oct 9 '15 at 18:00

samthebrand

1,03542344

add a comment |

protected by jezrael Mar 16 '16 at 11:53

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl