How to drop rows of Pandas DataFrame whose value in certain columns is NaN
I have a DataFrame
:
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
Then I just want the records whose EPS
is not NaN
, that is, df.drop(....)
will return the dataframe as below:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
How do I do that?
python pandas dataframe
add a comment |
I have a DataFrame
:
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
Then I just want the records whose EPS
is not NaN
, that is, df.drop(....)
will return the dataframe as below:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
How do I do that?
python pandas dataframe
19
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
134
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
add a comment |
I have a DataFrame
:
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
Then I just want the records whose EPS
is not NaN
, that is, df.drop(....)
will return the dataframe as below:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
How do I do that?
python pandas dataframe
I have a DataFrame
:
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
Then I just want the records whose EPS
is not NaN
, that is, df.drop(....)
will return the dataframe as below:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
How do I do that?
python pandas dataframe
python pandas dataframe
edited Jan 5 '17 at 17:01
Ninjakannon
2,71443051
2,71443051
asked Nov 16 '12 at 9:17
bigbugbigbug
12k296385
12k296385
19
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
134
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
add a comment |
19
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
134
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
19
19
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
134
134
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
add a comment |
12 Answers
12
active
oldest
votes
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
384
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
9
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
3
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 '18 at 10:02
|
show 2 more comments
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
201
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 '18 at 21:51
add a comment |
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
10
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 '18 at 12:41
add a comment |
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
add a comment |
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
add a comment |
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
add a comment |
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
add a comment |
Simple and easy way
df.dropna(subset=['EPS'],inplace=True)
source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
add a comment |
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
add a comment |
protected by jezrael Mar 16 '16 at 11:53
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
12 Answers
12
active
oldest
votes
12 Answers
12
active
oldest
votes
active
oldest
votes
active
oldest
votes
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
384
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
9
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
3
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 '18 at 10:02
|
show 2 more comments
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
384
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
9
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
3
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 '18 at 10:02
|
show 2 more comments
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
answered Nov 16 '12 at 9:34
eumiroeumiro
131k19232230
131k19232230
384
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
9
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
3
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 '18 at 10:02
|
show 2 more comments
384
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
9
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
3
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 '18 at 10:02
384
384
I'd recommend using
pandas.notnull
instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
I'd recommend using
pandas.notnull
instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
9
9
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
3
3
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
4
@PhilippSchwarz This error occurs if the column (
EPS
in the example) contains strings or other types that cannot be digested by np.isfinite()
. I recommend to use pandas.notnull()
that will handle this more generously.– normanius
Apr 5 '18 at 10:02
@PhilippSchwarz This error occurs if the column (
EPS
in the example) contains strings or other types that cannot be digested by np.isfinite()
. I recommend to use pandas.notnull()
that will handle this more generously.– normanius
Apr 5 '18 at 10:02
|
show 2 more comments
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
201
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 '18 at 21:51
add a comment |
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
201
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 '18 at 21:51
add a comment |
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
edited Aug 14 '17 at 0:04
ayhan
38k671106
38k671106
answered Nov 17 '12 at 20:27
AmanAman
25.3k62535
25.3k62535
201
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 '18 at 21:51
add a comment |
201
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 '18 at 21:51
201
201
you can also use
df.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1– James Tobin
Jun 18 '14 at 14:07
you can also use
df.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1– James Tobin
Jun 18 '14 at 14:07
8
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 '18 at 21:51
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 '18 at 21:51
add a comment |
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
answered Apr 23 '14 at 5:37
Kirk HadleyKirk Hadley
1,04672
1,04672
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
7
7
Actually, the specific answer would be:
df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)– joris
Apr 23 '14 at 12:53
Actually, the specific answer would be:
df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)– joris
Apr 23 '14 at 12:53
2
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.– fantabolous
Jul 9 '14 at 3:24
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
10
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 '18 at 12:41
add a comment |
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
10
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 '18 at 12:41
add a comment |
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
edited Aug 21 '17 at 9:49
Mojtaba Khodadadi
58457
58457
answered Aug 2 '17 at 16:28
JoeJoe
6,10421530
6,10421530
10
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 '18 at 12:41
add a comment |
10
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 '18 at 12:41
10
10
how='all'
is redundant here, because you subsetting dataframe only with one field so both 'all'
and 'any'
will have the same effect.– Anton Protopopov
Jan 16 '18 at 12:41
how='all'
is redundant here, because you subsetting dataframe only with one field so both 'all'
and 'any'
will have the same effect.– Anton Protopopov
Jan 16 '18 at 12:41
add a comment |
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
add a comment |
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
add a comment |
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
edited Aug 8 '18 at 15:17
ayhan
38k671106
38k671106
answered Nov 23 '17 at 12:08
Gil BaggioGil Baggio
2,70711822
2,70711822
add a comment |
add a comment |
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
add a comment |
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
add a comment |
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
answered Dec 4 '15 at 7:01
Anton ProtopopovAnton Protopopov
15.4k34960
15.4k34960
add a comment |
add a comment |
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
answered Apr 20 '17 at 21:15
MaxUMaxU
124k12125182
124k12125182
add a comment |
add a comment |
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
add a comment |
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
add a comment |
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
answered Oct 14 '18 at 19:26
UmerUmer
753716
753716
add a comment |
add a comment |
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
answered Oct 18 '18 at 23:55
U9-ForwardU9-Forward
16.9k51643
16.9k51643
add a comment |
add a comment |
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
add a comment |
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
add a comment |
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
edited Jan 26 '17 at 23:12
aesede
4,03722831
4,03722831
answered Mar 15 '16 at 15:33
DavidDavid
191
191
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
add a comment |
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
Sorry, but OP want someting else. Btw, your code is wrong, return
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.– jezrael
Mar 16 '16 at 11:52
Sorry, but OP want someting else. Btw, your code is wrong, return
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.– jezrael
Mar 16 '16 at 11:52
add a comment |
Simple and easy way
df.dropna(subset=['EPS'],inplace=True)
source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
add a comment |
Simple and easy way
df.dropna(subset=['EPS'],inplace=True)
source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
add a comment |
Simple and easy way
df.dropna(subset=['EPS'],inplace=True)
source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
Simple and easy way
df.dropna(subset=['EPS'],inplace=True)
source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
edited Jan 23 at 10:13
user10954831
answered Jan 22 at 8:26
NursnaazNursnaaz
430719
430719
add a comment |
add a comment |
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
add a comment |
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
add a comment |
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
edited Oct 9 '15 at 18:25
answered Oct 9 '15 at 18:00
samthebrandsamthebrand
1,03542344
1,03542344
add a comment |
add a comment |
protected by jezrael Mar 16 '16 at 11:53
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
19
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
134
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53