Lookup with Missing Labels
up vote
1
down vote
favorite
I have a code that uses a dataframe to look up a value (P) given it's column label (X):
df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])
When I give it a label in df_1 but don't include that label in df_2, like this:
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
I get:
KeyError: 'One or more column labels was not found'
How can I skip those ones please, to get:
X P
0 7 NaN
1 2 1
2 3 4
3 1 4
4 1 1
5 2 1
6 1 3
7 3 1
8 2 2
9 1 2
pandas lookup missing-data
add a comment |
up vote
1
down vote
favorite
I have a code that uses a dataframe to look up a value (P) given it's column label (X):
df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])
When I give it a label in df_1 but don't include that label in df_2, like this:
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
I get:
KeyError: 'One or more column labels was not found'
How can I skip those ones please, to get:
X P
0 7 NaN
1 2 1
2 3 4
3 1 4
4 1 1
5 2 1
6 1 3
7 3 1
8 2 2
9 1 2
pandas lookup missing-data
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a code that uses a dataframe to look up a value (P) given it's column label (X):
df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])
When I give it a label in df_1 but don't include that label in df_2, like this:
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
I get:
KeyError: 'One or more column labels was not found'
How can I skip those ones please, to get:
X P
0 7 NaN
1 2 1
2 3 4
3 1 4
4 1 1
5 2 1
6 1 3
7 3 1
8 2 2
9 1 2
pandas lookup missing-data
I have a code that uses a dataframe to look up a value (P) given it's column label (X):
df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])
When I give it a label in df_1 but don't include that label in df_2, like this:
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
I get:
KeyError: 'One or more column labels was not found'
How can I skip those ones please, to get:
X P
0 7 NaN
1 2 1
2 3 4
3 1 4
4 1 1
5 2 1
6 1 3
7 3 1
8 2 2
9 1 2
pandas lookup missing-data
pandas lookup missing-data
asked Nov 21 at 17:17
R. Cox
8810
8810
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
up vote
2
down vote
accepted
get and default values
def get_lu(df):
def lu(i, j):
return df.get(j, {}).get(i, np.nan)
return lu
[*map(get_lu(df_2), df_1.index, df_1.X)]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
Alternative
[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
All together
df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
Uglier version
df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
add a comment |
up vote
1
down vote
From the document adding try ...except
result =
for row, col in zip(df_1.index, df_1.X):
try :
result.append(df_2.loc[row, col])
except :
result.append(np.nan)
result
Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
add a comment |
up vote
0
down vote
A tad slower than @piRSquared, but using loc + lambda:
>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)
>> df_1
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
add a comment |
up vote
0
down vote
this answer uses numpy and is fast...
import numpy as np
setup dataframes
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
-
# designate working columns
lookup_cols = [1, 2, 3]
key_col = 'X'
result_col = 'P'
# get key column values as an array
key = df_1[key_col].values
# make an array of nans to hold the lookup results
result = np.full(key.shape[0], np.nan)
# create a boolean array containing only valid lookup indexes
b = np.isin(key, lookup_cols)
# filter df_1 and df_2 with boolean array b
df_1b = df_1[b]
df_2b = df_2[b]
# lookup values using filtered dataframes
lup = df_2b.lookup(df_1b.index, df_1b[key_col])
# put the results into the result array at proper index locations using b
result[b] = lup
# assign the result array to the dataframe result column
df_1[result_col] = result
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
get and default values
def get_lu(df):
def lu(i, j):
return df.get(j, {}).get(i, np.nan)
return lu
[*map(get_lu(df_2), df_1.index, df_1.X)]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
Alternative
[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
All together
df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
Uglier version
df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
add a comment |
up vote
2
down vote
accepted
get and default values
def get_lu(df):
def lu(i, j):
return df.get(j, {}).get(i, np.nan)
return lu
[*map(get_lu(df_2), df_1.index, df_1.X)]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
Alternative
[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
All together
df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
Uglier version
df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
get and default values
def get_lu(df):
def lu(i, j):
return df.get(j, {}).get(i, np.nan)
return lu
[*map(get_lu(df_2), df_1.index, df_1.X)]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
Alternative
[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
All together
df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
Uglier version
df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
get and default values
def get_lu(df):
def lu(i, j):
return df.get(j, {}).get(i, np.nan)
return lu
[*map(get_lu(df_2), df_1.index, df_1.X)]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
Alternative
[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
All together
df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
Uglier version
df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
edited Nov 21 at 17:46
answered Nov 21 at 17:34
piRSquared
150k21135277
150k21135277
add a comment |
add a comment |
up vote
1
down vote
From the document adding try ...except
result =
for row, col in zip(df_1.index, df_1.X):
try :
result.append(df_2.loc[row, col])
except :
result.append(np.nan)
result
Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
add a comment |
up vote
1
down vote
From the document adding try ...except
result =
for row, col in zip(df_1.index, df_1.X):
try :
result.append(df_2.loc[row, col])
except :
result.append(np.nan)
result
Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
add a comment |
up vote
1
down vote
up vote
1
down vote
From the document adding try ...except
result =
for row, col in zip(df_1.index, df_1.X):
try :
result.append(df_2.loc[row, col])
except :
result.append(np.nan)
result
Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
From the document adding try ...except
result =
for row, col in zip(df_1.index, df_1.X):
try :
result.append(df_2.loc[row, col])
except :
result.append(np.nan)
result
Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
answered Nov 21 at 17:29
W-B
94.7k72860
94.7k72860
add a comment |
add a comment |
up vote
0
down vote
A tad slower than @piRSquared, but using loc + lambda:
>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)
>> df_1
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
add a comment |
up vote
0
down vote
A tad slower than @piRSquared, but using loc + lambda:
>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)
>> df_1
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
add a comment |
up vote
0
down vote
up vote
0
down vote
A tad slower than @piRSquared, but using loc + lambda:
>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)
>> df_1
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
A tad slower than @piRSquared, but using loc + lambda:
>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)
>> df_1
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
edited Nov 21 at 18:04
answered Nov 21 at 17:59
user3471881
1,0491619
1,0491619
add a comment |
add a comment |
up vote
0
down vote
this answer uses numpy and is fast...
import numpy as np
setup dataframes
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
-
# designate working columns
lookup_cols = [1, 2, 3]
key_col = 'X'
result_col = 'P'
# get key column values as an array
key = df_1[key_col].values
# make an array of nans to hold the lookup results
result = np.full(key.shape[0], np.nan)
# create a boolean array containing only valid lookup indexes
b = np.isin(key, lookup_cols)
# filter df_1 and df_2 with boolean array b
df_1b = df_1[b]
df_2b = df_2[b]
# lookup values using filtered dataframes
lup = df_2b.lookup(df_1b.index, df_1b[key_col])
# put the results into the result array at proper index locations using b
result[b] = lup
# assign the result array to the dataframe result column
df_1[result_col] = result
add a comment |
up vote
0
down vote
this answer uses numpy and is fast...
import numpy as np
setup dataframes
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
-
# designate working columns
lookup_cols = [1, 2, 3]
key_col = 'X'
result_col = 'P'
# get key column values as an array
key = df_1[key_col].values
# make an array of nans to hold the lookup results
result = np.full(key.shape[0], np.nan)
# create a boolean array containing only valid lookup indexes
b = np.isin(key, lookup_cols)
# filter df_1 and df_2 with boolean array b
df_1b = df_1[b]
df_2b = df_2[b]
# lookup values using filtered dataframes
lup = df_2b.lookup(df_1b.index, df_1b[key_col])
# put the results into the result array at proper index locations using b
result[b] = lup
# assign the result array to the dataframe result column
df_1[result_col] = result
add a comment |
up vote
0
down vote
up vote
0
down vote
this answer uses numpy and is fast...
import numpy as np
setup dataframes
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
-
# designate working columns
lookup_cols = [1, 2, 3]
key_col = 'X'
result_col = 'P'
# get key column values as an array
key = df_1[key_col].values
# make an array of nans to hold the lookup results
result = np.full(key.shape[0], np.nan)
# create a boolean array containing only valid lookup indexes
b = np.isin(key, lookup_cols)
# filter df_1 and df_2 with boolean array b
df_1b = df_1[b]
df_2b = df_2[b]
# lookup values using filtered dataframes
lup = df_2b.lookup(df_1b.index, df_1b[key_col])
# put the results into the result array at proper index locations using b
result[b] = lup
# assign the result array to the dataframe result column
df_1[result_col] = result
this answer uses numpy and is fast...
import numpy as np
setup dataframes
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
-
# designate working columns
lookup_cols = [1, 2, 3]
key_col = 'X'
result_col = 'P'
# get key column values as an array
key = df_1[key_col].values
# make an array of nans to hold the lookup results
result = np.full(key.shape[0], np.nan)
# create a boolean array containing only valid lookup indexes
b = np.isin(key, lookup_cols)
# filter df_1 and df_2 with boolean array b
df_1b = df_1[b]
df_2b = df_2[b]
# lookup values using filtered dataframes
lup = df_2b.lookup(df_1b.index, df_1b[key_col])
# put the results into the result array at proper index locations using b
result[b] = lup
# assign the result array to the dataframe result column
df_1[result_col] = result
edited Nov 21 at 19:43
answered Nov 21 at 19:31
b2002
536148
536148
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417435%2flookup-with-missing-labels%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown