How to replace a float value with NaN in pandas?
up vote
1
down vote
favorite
I'm aware about the replace function in pandas: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html
But I've done this simple test and it is not working as expected when I try to replace a float value:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))
A B C D
0 1.437202 1.919894 -1.40674 -0.316737
df = df.replace(1.437202, np.nan)
print(df.head(n=1))
A B C D
0 1.437202 1.919894 -1.40674 -0.316737
As you see the [[0],[0]] has no change...any idea about what this could be due to?
python pandas replace nan
add a comment |
up vote
1
down vote
favorite
I'm aware about the replace function in pandas: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html
But I've done this simple test and it is not working as expected when I try to replace a float value:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))
A B C D
0 1.437202 1.919894 -1.40674 -0.316737
df = df.replace(1.437202, np.nan)
print(df.head(n=1))
A B C D
0 1.437202 1.919894 -1.40674 -0.316737
As you see the [[0],[0]] has no change...any idea about what this could be due to?
python pandas replace nan
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm aware about the replace function in pandas: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html
But I've done this simple test and it is not working as expected when I try to replace a float value:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))
A B C D
0 1.437202 1.919894 -1.40674 -0.316737
df = df.replace(1.437202, np.nan)
print(df.head(n=1))
A B C D
0 1.437202 1.919894 -1.40674 -0.316737
As you see the [[0],[0]] has no change...any idea about what this could be due to?
python pandas replace nan
I'm aware about the replace function in pandas: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html
But I've done this simple test and it is not working as expected when I try to replace a float value:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))
A B C D
0 1.437202 1.919894 -1.40674 -0.316737
df = df.replace(1.437202, np.nan)
print(df.head(n=1))
A B C D
0 1.437202 1.919894 -1.40674 -0.316737
As you see the [[0],[0]] has no change...any idea about what this could be due to?
python pandas replace nan
python pandas replace nan
asked Nov 22 at 8:35
ralvarez
961111
961111
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
3
down vote
Problem is float precision, so use function numpy.isclose
with mask
:
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))
A B C D
0 -1.085631 0.997345 0.282978 -1.506295
df = df.mask(np.isclose(df.values, 0.997345))
Or use numpy.where
:
arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print(df.head(n=1))
A B C D
0 -1.085631 NaN 0.282978 -1.506295
EDIT: You can also get only numeric columns by select_dtypes
for filtering by subset with :
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')
cols = df.select_dtypes(np.number).columns
df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
print(df.head(n=1))
A B C D E
0 -1.085631 NaN 0.282978 -1.506295 a
Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a stringdf.iloc[[0], [0]] = 'random_string'
and then I try to apply both methods over the whole dataset, they return an errorTypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– ralvarez
Nov 22 at 9:43
Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
– ralvarez
Nov 22 at 9:49
@ralvarez - Added general solution - filtering only numeric columns and apply solution
– jezrael
Nov 22 at 9:53
add a comment |
up vote
0
down vote
Just a another trick for specific indices :
>>> print(df.head(n=1))
A B C D
0 -0.042839 1.701118 0.064779 1.513046
>>> df['A'][0] = np.nan
>>> print(df.head(n=1))
A B C D
0 NaN 1.701118 0.064779 1.513046
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
Problem is float precision, so use function numpy.isclose
with mask
:
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))
A B C D
0 -1.085631 0.997345 0.282978 -1.506295
df = df.mask(np.isclose(df.values, 0.997345))
Or use numpy.where
:
arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print(df.head(n=1))
A B C D
0 -1.085631 NaN 0.282978 -1.506295
EDIT: You can also get only numeric columns by select_dtypes
for filtering by subset with :
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')
cols = df.select_dtypes(np.number).columns
df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
print(df.head(n=1))
A B C D E
0 -1.085631 NaN 0.282978 -1.506295 a
Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a stringdf.iloc[[0], [0]] = 'random_string'
and then I try to apply both methods over the whole dataset, they return an errorTypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– ralvarez
Nov 22 at 9:43
Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
– ralvarez
Nov 22 at 9:49
@ralvarez - Added general solution - filtering only numeric columns and apply solution
– jezrael
Nov 22 at 9:53
add a comment |
up vote
3
down vote
Problem is float precision, so use function numpy.isclose
with mask
:
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))
A B C D
0 -1.085631 0.997345 0.282978 -1.506295
df = df.mask(np.isclose(df.values, 0.997345))
Or use numpy.where
:
arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print(df.head(n=1))
A B C D
0 -1.085631 NaN 0.282978 -1.506295
EDIT: You can also get only numeric columns by select_dtypes
for filtering by subset with :
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')
cols = df.select_dtypes(np.number).columns
df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
print(df.head(n=1))
A B C D E
0 -1.085631 NaN 0.282978 -1.506295 a
Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a stringdf.iloc[[0], [0]] = 'random_string'
and then I try to apply both methods over the whole dataset, they return an errorTypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– ralvarez
Nov 22 at 9:43
Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
– ralvarez
Nov 22 at 9:49
@ralvarez - Added general solution - filtering only numeric columns and apply solution
– jezrael
Nov 22 at 9:53
add a comment |
up vote
3
down vote
up vote
3
down vote
Problem is float precision, so use function numpy.isclose
with mask
:
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))
A B C D
0 -1.085631 0.997345 0.282978 -1.506295
df = df.mask(np.isclose(df.values, 0.997345))
Or use numpy.where
:
arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print(df.head(n=1))
A B C D
0 -1.085631 NaN 0.282978 -1.506295
EDIT: You can also get only numeric columns by select_dtypes
for filtering by subset with :
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')
cols = df.select_dtypes(np.number).columns
df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
print(df.head(n=1))
A B C D E
0 -1.085631 NaN 0.282978 -1.506295 a
Problem is float precision, so use function numpy.isclose
with mask
:
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
print(df.head(n=1))
A B C D
0 -1.085631 0.997345 0.282978 -1.506295
df = df.mask(np.isclose(df.values, 0.997345))
Or use numpy.where
:
arr = np.where(np.isclose(df.values, 0.997345), np.nan, df.values)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print(df.head(n=1))
A B C D
0 -1.085631 NaN 0.282978 -1.506295
EDIT: You can also get only numeric columns by select_dtypes
for filtering by subset with :
np.random.seed(123)
df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD')).assign(E='a')
cols = df.select_dtypes(np.number).columns
df[cols] = df[cols].mask(np.isclose(df[cols].values, 0.997345))
print(df.head(n=1))
A B C D E
0 -1.085631 NaN 0.282978 -1.506295 a
edited Nov 22 at 9:50
answered Nov 22 at 8:38
jezrael
315k22254332
315k22254332
Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a stringdf.iloc[[0], [0]] = 'random_string'
and then I try to apply both methods over the whole dataset, they return an errorTypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– ralvarez
Nov 22 at 9:43
Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
– ralvarez
Nov 22 at 9:49
@ralvarez - Added general solution - filtering only numeric columns and apply solution
– jezrael
Nov 22 at 9:53
add a comment |
Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a stringdf.iloc[[0], [0]] = 'random_string'
and then I try to apply both methods over the whole dataset, they return an errorTypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– ralvarez
Nov 22 at 9:43
Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
– ralvarez
Nov 22 at 9:49
@ralvarez - Added general solution - filtering only numeric columns and apply solution
– jezrael
Nov 22 at 9:53
Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a string
df.iloc[[0], [0]] = 'random_string'
and then I try to apply both methods over the whole dataset, they return an error TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– ralvarez
Nov 22 at 9:43
Good options indeed, but they both fail if the column's datatypes are not all numeric. If I set a random value with a string
df.iloc[[0], [0]] = 'random_string'
and then I try to apply both methods over the whole dataset, they return an error TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– ralvarez
Nov 22 at 9:43
Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
– ralvarez
Nov 22 at 9:49
Maybe I should have explained that before. The example I gave was only with numeric values, but I'm looking for a method that should work with features of any type'
– ralvarez
Nov 22 at 9:49
@ralvarez - Added general solution - filtering only numeric columns and apply solution
– jezrael
Nov 22 at 9:53
@ralvarez - Added general solution - filtering only numeric columns and apply solution
– jezrael
Nov 22 at 9:53
add a comment |
up vote
0
down vote
Just a another trick for specific indices :
>>> print(df.head(n=1))
A B C D
0 -0.042839 1.701118 0.064779 1.513046
>>> df['A'][0] = np.nan
>>> print(df.head(n=1))
A B C D
0 NaN 1.701118 0.064779 1.513046
add a comment |
up vote
0
down vote
Just a another trick for specific indices :
>>> print(df.head(n=1))
A B C D
0 -0.042839 1.701118 0.064779 1.513046
>>> df['A'][0] = np.nan
>>> print(df.head(n=1))
A B C D
0 NaN 1.701118 0.064779 1.513046
add a comment |
up vote
0
down vote
up vote
0
down vote
Just a another trick for specific indices :
>>> print(df.head(n=1))
A B C D
0 -0.042839 1.701118 0.064779 1.513046
>>> df['A'][0] = np.nan
>>> print(df.head(n=1))
A B C D
0 NaN 1.701118 0.064779 1.513046
Just a another trick for specific indices :
>>> print(df.head(n=1))
A B C D
0 -0.042839 1.701118 0.064779 1.513046
>>> df['A'][0] = np.nan
>>> print(df.head(n=1))
A B C D
0 NaN 1.701118 0.064779 1.513046
answered Nov 22 at 9:52
pygo
1,7361416
1,7361416
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426787%2fhow-to-replace-a-float-value-with-nan-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown