Pandas Dataframe: for a given row, trying to assign value in a certain column based on a lookup of a value in...

up vote
0
down vote

favorite

Basically for a given row i, I am trying to assign i's value in the column 'Adj', to a certain value based on i's value in another column 'Local Max String'. Basically row i's value in 'Local Max String' needs to be searched up in another column of the DataFrame, 'Date String', and then the row that contains the value, row q, has it's value in the column 'Adj Close' be the value for row i's 'Adj' column.

Sorry if that is difficult to understand. The following for loop accomplished what I wanted to do, but I think there should be a better way to do it in Pandas. I tried using apply and lambda functions, but it said assignment wasn't possible, and I'm unsure if the way I was doing it was correct. The for loop also takes extremely long to complete.

Here's the code:

for x in range(0, len(df.index)):

    df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']

Here's a picture of the DF to get a better idea of what I mean. The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String.

enter image description here

import numpy as np

import pandas as pd

pd.core.common.is_list_like = pd.api.types.is_list_like

from pandas_datareader import data as pdr

import matplotlib.pyplot as plt

import datetime

import fix_yahoo_finance as yf

yf.pdr_override() # <== that's all it takes :-)



# Dates for data

start_date = datetime.datetime(2017,11,1)

end_date = datetime.datetime(2018,11,1)



df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)



df.data = df['Adj Close']

df['Most Recent Local Max'] = np.nan

df['Date'] = df.index

local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)

local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))



def nearest(items, pivot):

    return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))



df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )



df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))



df['Date String'] = df['Date'].apply(lambda x: str(x))



df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])



df['Adj'] = np.nan

Thanks!

edited Nov 22 at 17:59

asked Nov 21 at 20:31

Slade

3816

1

Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24

1

Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simple df as code (and not as an image) so others can just reproduce your problem and help you more easily!
– Julian Peller
Nov 22 at 2:25

Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00

I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01

add a comment |

up vote
0
down vote

favorite

Here's the code:

for x in range(0, len(df.index)):

    df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']

Here's a picture of the DF to get a better idea of what I mean. The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String.

enter image description here

import numpy as np

import pandas as pd

pd.core.common.is_list_like = pd.api.types.is_list_like

from pandas_datareader import data as pdr

import matplotlib.pyplot as plt

import datetime

import fix_yahoo_finance as yf

yf.pdr_override() # <== that's all it takes :-)



# Dates for data

start_date = datetime.datetime(2017,11,1)

end_date = datetime.datetime(2018,11,1)



df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)



df.data = df['Adj Close']

df['Most Recent Local Max'] = np.nan

df['Date'] = df.index

local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)

local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))



def nearest(items, pivot):

    return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))



df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )



df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))



df['Date String'] = df['Date'].apply(lambda x: str(x))



df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])



df['Adj'] = np.nan

Thanks!

edited Nov 22 at 17:59

asked Nov 21 at 20:31

Slade

3816

1

Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24

1

Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simple df as code (and not as an image) so others can just reproduce your problem and help you more easily!
– Julian Peller
Nov 22 at 2:25

Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00

I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01

add a comment |

up vote
0
down vote

favorite

Here's the code:

for x in range(0, len(df.index)):

    df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']

Here's a picture of the DF to get a better idea of what I mean. The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String.

enter image description here

import numpy as np

import pandas as pd

pd.core.common.is_list_like = pd.api.types.is_list_like

from pandas_datareader import data as pdr

import matplotlib.pyplot as plt

import datetime

import fix_yahoo_finance as yf

yf.pdr_override() # <== that's all it takes :-)



# Dates for data

start_date = datetime.datetime(2017,11,1)

end_date = datetime.datetime(2018,11,1)



df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)



df.data = df['Adj Close']

df['Most Recent Local Max'] = np.nan

df['Date'] = df.index

local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)

local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))



def nearest(items, pivot):

    return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))



df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )



df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))



df['Date String'] = df['Date'].apply(lambda x: str(x))



df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])



df['Adj'] = np.nan

Thanks!

edited Nov 22 at 17:59

asked Nov 21 at 20:31

Slade

3816

Here's the code:

for x in range(0, len(df.index)):

    df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']

Here's a picture of the DF to get a better idea of what I mean. The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String.

enter image description here

import numpy as np

import pandas as pd

pd.core.common.is_list_like = pd.api.types.is_list_like

from pandas_datareader import data as pdr

import matplotlib.pyplot as plt

import datetime

import fix_yahoo_finance as yf

yf.pdr_override() # <== that's all it takes :-)



# Dates for data

start_date = datetime.datetime(2017,11,1)

end_date = datetime.datetime(2018,11,1)



df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)



df.data = df['Adj Close']

df['Most Recent Local Max'] = np.nan

df['Date'] = df.index

local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)

local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))



def nearest(items, pivot):

    return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))



df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )



df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))



df['Date String'] = df['Date'].apply(lambda x: str(x))



df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])



df['Adj'] = np.nan

Thanks!

python pandas

edited Nov 22 at 17:59

asked Nov 21 at 20:31

Slade

3816

edited Nov 22 at 17:59

asked Nov 21 at 20:31

Slade

3816

edited Nov 22 at 17:59

asked Nov 21 at 20:31

Slade

3816

asked Nov 21 at 20:31

Slade

3816

asked Nov 21 at 20:31

Slade

3816

1

Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24

1

Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simple df as code (and not as an image) so others can just reproduce your problem and help you more easily!
– Julian Peller
Nov 22 at 2:25

Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00

I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01

add a comment |

1

Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24

1

Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simple df as code (and not as an image) so others can just reproduce your problem and help you more easily!
– Julian Peller
Nov 22 at 2:25

Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00

I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01

Please add example of input data and expected output.
– user3471881
Nov 21 at 21:24

Hi! I just added an answer, but I was unable to test it because the input question don't have a Minimal, Complete, and Verifiable example. If it doesn't work, please, add simple df as code (and not as an image) so others can just reproduce your problem and help you more easily!
– Julian Peller
Nov 22 at 2:25

Hi. I put in the code. I'm not sure if it's really minimal, but I couldn't think of a better way to get the data to you guys, since the formatting kept getting messed up.
– Slade
Nov 22 at 18:00

I tried your code but it isn't working properly, even when I use forward fill to fill in the rows. Can you take another look? Thanks a lot!
– Slade
Nov 22 at 18:01

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

This solution still has a for, but it reduces the amount of iterations from df.shape[1] to df['Local Max String'].nunique(), so it may be fast enough:

for a_local_max in df['Local Max String'].unique():

    df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]

answered Nov 22 at 2:24

Julian Peller

844511

add a comment |

up vote
0
down vote

Often you can skip the for loop by using apply-like function in pandas. Hereafter, I define a wrapper function which combines variables row-wisely.
Finally this function is applied on the data frame to create the result variable. The key element here is to think on the row level within the wrapper function and indicate this behaviour to the apply function with the axis=1 argument.

import pandas as pd

import numpy as np



# Dummy data containing two columns with overlapping data

df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})



# Here you define the operation at the row level

def wrapper(row):

#     uncomment if the transformation is to be applied to every column:

#     return 2*row['date']

#     if you need to first test some condition:

    if row['string'] == row['another_string']:

        return 2*row['date']

    else:

        return 0



# Finally you generate the new column using the operation defined above.

df['result'] = df.apply(wrapper, axis=1)

This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Nov 22 at 8:00

leoburgy

1086

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420052%2fpandas-dataframe-for-a-given-row-trying-to-assign-value-in-a-certain-column-ba%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

This solution still has a for, but it reduces the amount of iterations from df.shape[1] to df['Local Max String'].nunique(), so it may be fast enough:

for a_local_max in df['Local Max String'].unique():

    df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]

answered Nov 22 at 2:24

Julian Peller

844511

add a comment |

up vote
0
down vote

This solution still has a for, but it reduces the amount of iterations from df.shape[1] to df['Local Max String'].nunique(), so it may be fast enough:

for a_local_max in df['Local Max String'].unique():

    df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]

answered Nov 22 at 2:24

Julian Peller

844511

add a comment |

up vote
0
down vote

This solution still has a for, but it reduces the amount of iterations from df.shape[1] to df['Local Max String'].nunique(), so it may be fast enough:

for a_local_max in df['Local Max String'].unique():

    df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]

answered Nov 22 at 2:24

Julian Peller

844511

This solution still has a for, but it reduces the amount of iterations from df.shape[1] to df['Local Max String'].nunique(), so it may be fast enough:

for a_local_max in df['Local Max String'].unique():

    df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]

answered Nov 22 at 2:24

Julian Peller

844511

answered Nov 22 at 2:24

Julian Peller

844511

answered Nov 22 at 2:24

Julian Peller

844511

answered Nov 22 at 2:24

Julian Peller

844511

add a comment |

up vote
0
down vote

import pandas as pd

import numpy as np



# Dummy data containing two columns with overlapping data

df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})



# Here you define the operation at the row level

def wrapper(row):

#     uncomment if the transformation is to be applied to every column:

#     return 2*row['date']

#     if you need to first test some condition:

    if row['string'] == row['another_string']:

        return 2*row['date']

    else:

        return 0



# Finally you generate the new column using the operation defined above.

df['result'] = df.apply(wrapper, axis=1)

This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Nov 22 at 8:00

leoburgy

1086

add a comment |

up vote
0
down vote

import pandas as pd

import numpy as np



# Dummy data containing two columns with overlapping data

df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})



# Here you define the operation at the row level

def wrapper(row):

#     uncomment if the transformation is to be applied to every column:

#     return 2*row['date']

#     if you need to first test some condition:

    if row['string'] == row['another_string']:

        return 2*row['date']

    else:

        return 0



# Finally you generate the new column using the operation defined above.

df['result'] = df.apply(wrapper, axis=1)

This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Nov 22 at 8:00

leoburgy

1086

add a comment |

up vote
0
down vote

import pandas as pd

import numpy as np



# Dummy data containing two columns with overlapping data

df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})



# Here you define the operation at the row level

def wrapper(row):

#     uncomment if the transformation is to be applied to every column:

#     return 2*row['date']

#     if you need to first test some condition:

    if row['string'] == row['another_string']:

        return 2*row['date']

    else:

        return 0



# Finally you generate the new column using the operation defined above.

df['result'] = df.apply(wrapper, axis=1)

This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Nov 22 at 8:00

leoburgy

1086

import pandas as pd

import numpy as np



# Dummy data containing two columns with overlapping data

df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})



# Here you define the operation at the row level

def wrapper(row):

#     uncomment if the transformation is to be applied to every column:

#     return 2*row['date']

#     if you need to first test some condition:

    if row['string'] == row['another_string']:

        return 2*row['date']

    else:

        return 0



# Finally you generate the new column using the operation defined above.

df['result'] = df.apply(wrapper, axis=1)

This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Nov 22 at 8:00

leoburgy

1086

answered Nov 22 at 8:00

leoburgy

1086

answered Nov 22 at 8:00

leoburgy

1086

answered Nov 22 at 8:00

leoburgy

1086

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl