Pandas Dataframe: for a given row, trying to assign value in a certain column based on a lookup of a value in...
up vote
0
down vote
favorite
Basically for a given row i, I am trying to assign i's value in the column 'Adj', to a certain value based on i's value in another column 'Local Max String'. Basically row i's value in 'Local Max String' needs to be searched up in another column of the DataFrame, 'Date String', and then the row that contains the value, row q, has it's value in the column 'Adj Close' be the value for row i's 'Adj' column.
Sorry if that is difficult to understand. The following for loop accomplished what I wanted to do, but I think there should be a better way to do it in Pandas. I tried using apply and lambda functions, but it said assignment wasn't possible, and I'm unsure if the way I was doing it was correct. The for loop also takes extremely long to complete.
Here's the code:
for x in range(0, len(df.index)):
df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']
Here's a picture of the DF to get a better idea of what I mean. The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String.
import numpy as np
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import datetime
import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)
# Dates for data
start_date = datetime.datetime(2017,11,1)
end_date = datetime.datetime(2018,11,1)
df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)
df.data = df['Adj Close']
df['Most Recent Local Max'] = np.nan
df['Date'] = df.index
local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)
local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))
def nearest(items, pivot):
return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))
df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )
df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))
df['Date String'] = df['Date'].apply(lambda x: str(x))
df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])
df['Adj'] = np.nan
Thanks!
python pandas
add a comment |
up vote
0
down vote
favorite
Basically for a given row i, I am trying to assign i's value in the column 'Adj', to a certain value based on i's value in another column 'Local Max String'. Basically row i's value in 'Local Max String' needs to be searched up in another column of the DataFrame, 'Date String', and then the row that contains the value, row q, has it's value in the column 'Adj Close' be the value for row i's 'Adj' column.
Sorry if that is difficult to understand. The following for loop accomplished what I wanted to do, but I think there should be a better way to do it in Pandas. I tried using apply and lambda functions, but it said assignment wasn't possible, and I'm unsure if the way I was doing it was correct. The for loop also takes extremely long to complete.
Here's the code:
for x in range(0, len(df.index)):
df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']
Here's a picture of the DF to get a better idea of what I mean. The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String.
import numpy as np
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import datetime
import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)
# Dates for data
start_date = datetime.datetime(2017,11,1)
end_date = datetime.datetime(2018,11,1)
df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)
df.data = df['Adj Close']
df['Most Recent Local Max'] = np.nan
df['Date'] = df.index
local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)
local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))
def nearest(items, pivot):
return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))
df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )
df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))
df['Date String'] = df['Date'].apply(lambda x: str(x))
df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])
df['Adj'] = np.nan
Thanks!
python pandas
1
Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24
1
Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simpledf
as code (and not as an image) so others can just reproduce your problem and help you more easily!
– Julian Peller
Nov 22 at 2:25
Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00
I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Basically for a given row i, I am trying to assign i's value in the column 'Adj', to a certain value based on i's value in another column 'Local Max String'. Basically row i's value in 'Local Max String' needs to be searched up in another column of the DataFrame, 'Date String', and then the row that contains the value, row q, has it's value in the column 'Adj Close' be the value for row i's 'Adj' column.
Sorry if that is difficult to understand. The following for loop accomplished what I wanted to do, but I think there should be a better way to do it in Pandas. I tried using apply and lambda functions, but it said assignment wasn't possible, and I'm unsure if the way I was doing it was correct. The for loop also takes extremely long to complete.
Here's the code:
for x in range(0, len(df.index)):
df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']
Here's a picture of the DF to get a better idea of what I mean. The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String.
import numpy as np
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import datetime
import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)
# Dates for data
start_date = datetime.datetime(2017,11,1)
end_date = datetime.datetime(2018,11,1)
df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)
df.data = df['Adj Close']
df['Most Recent Local Max'] = np.nan
df['Date'] = df.index
local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)
local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))
def nearest(items, pivot):
return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))
df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )
df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))
df['Date String'] = df['Date'].apply(lambda x: str(x))
df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])
df['Adj'] = np.nan
Thanks!
python pandas
Basically for a given row i, I am trying to assign i's value in the column 'Adj', to a certain value based on i's value in another column 'Local Max String'. Basically row i's value in 'Local Max String' needs to be searched up in another column of the DataFrame, 'Date String', and then the row that contains the value, row q, has it's value in the column 'Adj Close' be the value for row i's 'Adj' column.
Sorry if that is difficult to understand. The following for loop accomplished what I wanted to do, but I think there should be a better way to do it in Pandas. I tried using apply and lambda functions, but it said assignment wasn't possible, and I'm unsure if the way I was doing it was correct. The for loop also takes extremely long to complete.
Here's the code:
for x in range(0, len(df.index)):
df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']
Here's a picture of the DF to get a better idea of what I mean. The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String.
import numpy as np
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import datetime
import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)
# Dates for data
start_date = datetime.datetime(2017,11,1)
end_date = datetime.datetime(2018,11,1)
df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)
df.data = df['Adj Close']
df['Most Recent Local Max'] = np.nan
df['Date'] = df.index
local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)
local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))
def nearest(items, pivot):
return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))
df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )
df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))
df['Date String'] = df['Date'].apply(lambda x: str(x))
df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])
df['Adj'] = np.nan
Thanks!
python pandas
python pandas
edited Nov 22 at 17:59
asked Nov 21 at 20:31
Slade
3816
3816
1
Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24
1
Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simpledf
as code (and not as an image) so others can just reproduce your problem and help you more easily!
– Julian Peller
Nov 22 at 2:25
Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00
I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01
add a comment |
1
Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24
1
Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simpledf
as code (and not as an image) so others can just reproduce your problem and help you more easily!
– Julian Peller
Nov 22 at 2:25
Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00
I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01
1
1
Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24
Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24
1
1
Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simple
df
as code (and not as an image) so others can just reproduce your problem and help you more easily!– Julian Peller
Nov 22 at 2:25
Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simple
df
as code (and not as an image) so others can just reproduce your problem and help you more easily!– Julian Peller
Nov 22 at 2:25
Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00
Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00
I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01
I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
This solution still has a for, but it reduces the amount of iterations from df.shape[1]
to df['Local Max String'].nunique()
, so it may be fast enough:
for a_local_max in df['Local Max String'].unique():
df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]
add a comment |
up vote
0
down vote
Often you can skip the for loop by using apply-like function in pandas
. Hereafter, I define a wrapper
function which combines variables row-wisely.
Finally this function is applied on the data frame to create the result
variable. The key element here is to think on the row level within the wrapper
function and indicate this behaviour to the apply
function with the axis=1
argument.
import pandas as pd
import numpy as np
# Dummy data containing two columns with overlapping data
df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})
# Here you define the operation at the row level
def wrapper(row):
# uncomment if the transformation is to be applied to every column:
# return 2*row['date']
# if you need to first test some condition:
if row['string'] == row['another_string']:
return 2*row['date']
else:
return 0
# Finally you generate the new column using the operation defined above.
df['result'] = df.apply(wrapper, axis=1)
This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
This solution still has a for, but it reduces the amount of iterations from df.shape[1]
to df['Local Max String'].nunique()
, so it may be fast enough:
for a_local_max in df['Local Max String'].unique():
df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]
add a comment |
up vote
0
down vote
This solution still has a for, but it reduces the amount of iterations from df.shape[1]
to df['Local Max String'].nunique()
, so it may be fast enough:
for a_local_max in df['Local Max String'].unique():
df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]
add a comment |
up vote
0
down vote
up vote
0
down vote
This solution still has a for, but it reduces the amount of iterations from df.shape[1]
to df['Local Max String'].nunique()
, so it may be fast enough:
for a_local_max in df['Local Max String'].unique():
df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]
This solution still has a for, but it reduces the amount of iterations from df.shape[1]
to df['Local Max String'].nunique()
, so it may be fast enough:
for a_local_max in df['Local Max String'].unique():
df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]
answered Nov 22 at 2:24
Julian Peller
844511
844511
add a comment |
add a comment |
up vote
0
down vote
Often you can skip the for loop by using apply-like function in pandas
. Hereafter, I define a wrapper
function which combines variables row-wisely.
Finally this function is applied on the data frame to create the result
variable. The key element here is to think on the row level within the wrapper
function and indicate this behaviour to the apply
function with the axis=1
argument.
import pandas as pd
import numpy as np
# Dummy data containing two columns with overlapping data
df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})
# Here you define the operation at the row level
def wrapper(row):
# uncomment if the transformation is to be applied to every column:
# return 2*row['date']
# if you need to first test some condition:
if row['string'] == row['another_string']:
return 2*row['date']
else:
return 0
# Finally you generate the new column using the operation defined above.
df['result'] = df.apply(wrapper, axis=1)
This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
add a comment |
up vote
0
down vote
Often you can skip the for loop by using apply-like function in pandas
. Hereafter, I define a wrapper
function which combines variables row-wisely.
Finally this function is applied on the data frame to create the result
variable. The key element here is to think on the row level within the wrapper
function and indicate this behaviour to the apply
function with the axis=1
argument.
import pandas as pd
import numpy as np
# Dummy data containing two columns with overlapping data
df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})
# Here you define the operation at the row level
def wrapper(row):
# uncomment if the transformation is to be applied to every column:
# return 2*row['date']
# if you need to first test some condition:
if row['string'] == row['another_string']:
return 2*row['date']
else:
return 0
# Finally you generate the new column using the operation defined above.
df['result'] = df.apply(wrapper, axis=1)
This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
add a comment |
up vote
0
down vote
up vote
0
down vote
Often you can skip the for loop by using apply-like function in pandas
. Hereafter, I define a wrapper
function which combines variables row-wisely.
Finally this function is applied on the data frame to create the result
variable. The key element here is to think on the row level within the wrapper
function and indicate this behaviour to the apply
function with the axis=1
argument.
import pandas as pd
import numpy as np
# Dummy data containing two columns with overlapping data
df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})
# Here you define the operation at the row level
def wrapper(row):
# uncomment if the transformation is to be applied to every column:
# return 2*row['date']
# if you need to first test some condition:
if row['string'] == row['another_string']:
return 2*row['date']
else:
return 0
# Finally you generate the new column using the operation defined above.
df['result'] = df.apply(wrapper, axis=1)
This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Often you can skip the for loop by using apply-like function in pandas
. Hereafter, I define a wrapper
function which combines variables row-wisely.
Finally this function is applied on the data frame to create the result
variable. The key element here is to think on the row level within the wrapper
function and indicate this behaviour to the apply
function with the axis=1
argument.
import pandas as pd
import numpy as np
# Dummy data containing two columns with overlapping data
df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})
# Here you define the operation at the row level
def wrapper(row):
# uncomment if the transformation is to be applied to every column:
# return 2*row['date']
# if you need to first test some condition:
if row['string'] == row['another_string']:
return 2*row['date']
else:
return 0
# Finally you generate the new column using the operation defined above.
df['result'] = df.apply(wrapper, axis=1)
This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
answered Nov 22 at 8:00
leoburgy
1086
1086
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420052%2fpandas-dataframe-for-a-given-row-trying-to-assign-value-in-a-certain-column-ba%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24
1
Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simple
df
as code (and not as an image) so others can just reproduce your problem and help you more easily!– Julian Peller
Nov 22 at 2:25
Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00
I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01