Lookup with Missing Labels

up vote
1
down vote

favorite

I have a code that uses a dataframe to look up a value (P) given it's column label (X):

df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})



df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],

                      2 : [4,1,2,3,4,1,2,1,2,3],

                      3 : [2,3,4,1,2,3,4,1,2,5]})



df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])

When I give it a label in df_1 but don't include that label in df_2, like this:

df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})

I get:

KeyError: 'One or more column labels was not found'

How can I skip those ones please, to get:

asked Nov 21 at 17:17

R. Cox

8810

add a comment |

up vote
1
down vote

favorite

I have a code that uses a dataframe to look up a value (P) given it's column label (X):

df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})



df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],

                      2 : [4,1,2,3,4,1,2,1,2,3],

                      3 : [2,3,4,1,2,3,4,1,2,5]})



df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])

When I give it a label in df_1 but don't include that label in df_2, like this:

df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})

I get:

KeyError: 'One or more column labels was not found'

How can I skip those ones please, to get:

asked Nov 21 at 17:17

R. Cox

8810

add a comment |

up vote
1
down vote

favorite

I have a code that uses a dataframe to look up a value (P) given it's column label (X):

df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})



df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],

                      2 : [4,1,2,3,4,1,2,1,2,3],

                      3 : [2,3,4,1,2,3,4,1,2,5]})



df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])

When I give it a label in df_1 but don't include that label in df_2, like this:

df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})

I get:

KeyError: 'One or more column labels was not found'

How can I skip those ones please, to get:

asked Nov 21 at 17:17

R. Cox

8810

I have a code that uses a dataframe to look up a value (P) given it's column label (X):

df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})



df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],

                      2 : [4,1,2,3,4,1,2,1,2,3],

                      3 : [2,3,4,1,2,3,4,1,2,5]})



df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])

When I give it a label in df_1 but don't include that label in df_2, like this:

df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})

I get:

KeyError: 'One or more column labels was not found'

How can I skip those ones please, to get:

pandas lookup missing-data

asked Nov 21 at 17:17

R. Cox

8810

asked Nov 21 at 17:17

R. Cox

8810

asked Nov 21 at 17:17

R. Cox

8810

asked Nov 21 at 17:17

R. Cox

8810

asked Nov 21 at 17:17

R. Cox

8810

add a comment |

4 Answers
4

active

oldest

votes

up vote
2
down vote

accepted

`get` and default values

def get_lu(df):

  def lu(i, j):

    return df.get(j, {}).get(i, np.nan)

  return lu



[*map(get_lu(df_2), df_1.index, df_1.X)]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

Alternative

[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

All together

df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

Uglier version

df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

edited Nov 21 at 17:46

answered Nov 21 at 17:34

piRSquared

150k21135277

add a comment |

up vote
1
down vote

From the document adding try ...except

result = 

for row, col in zip(df_1.index, df_1.X):

    try :

        result.append(df_2.loc[row, col])

    except :

        result.append(np.nan)



result

Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

answered Nov 21 at 17:29

W-B

94.7k72860

add a comment |

up vote
0
down vote

A tad slower than @piRSquared, but using loc + lambda:

>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)

>> df_1



    X   P

0   7   NaN

1   2   1.0

2   3   4.0

3   1   4.0

4   1   1.0

5   2   1.0

6   1   3.0

7   3   1.0

8   2   2.0

9   1   2.0

edited Nov 21 at 18:04

answered Nov 21 at 17:59

user3471881

1,0491619

add a comment |

up vote
0
down vote

this answer uses numpy and is fast...

import numpy as np

setup dataframes

df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})



df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],

                      2 : [4,1,2,3,4,1,2,1,2,3],

                      3 : [2,3,4,1,2,3,4,1,2,5]})

# designate working columns

lookup_cols = [1, 2, 3]

key_col = 'X'

result_col = 'P'



# get key column values as an array

key = df_1[key_col].values



# make an array of nans to hold the lookup results

result = np.full(key.shape[0], np.nan)



# create a boolean array containing only valid lookup indexes

b = np.isin(key, lookup_cols)



# filter df_1 and df_2 with boolean array b

df_1b = df_1[b]

df_2b = df_2[b]



# lookup values using filtered dataframes

lup = df_2b.lookup(df_1b.index, df_1b[key_col])

# put the results into the result array at proper index locations using b

result[b] = lup

# assign the result array to the dataframe result column

df_1[result_col] = result

edited Nov 21 at 19:43

answered Nov 21 at 19:31

b2002

536148

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53417435%2flookup-with-missing-labels%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
2
down vote

accepted

`get` and default values

def get_lu(df):

  def lu(i, j):

    return df.get(j, {}).get(i, np.nan)

  return lu



[*map(get_lu(df_2), df_1.index, df_1.X)]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

Alternative

[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

All together

df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

Uglier version

df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

edited Nov 21 at 17:46

answered Nov 21 at 17:34

piRSquared

150k21135277

add a comment |

up vote
2
down vote

accepted

`get` and default values

def get_lu(df):

  def lu(i, j):

    return df.get(j, {}).get(i, np.nan)

  return lu



[*map(get_lu(df_2), df_1.index, df_1.X)]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

Alternative

[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

All together

df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

Uglier version

df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

edited Nov 21 at 17:46

answered Nov 21 at 17:34

piRSquared

150k21135277

add a comment |

up vote
2
down vote

accepted

`get` and default values

def get_lu(df):

  def lu(i, j):

    return df.get(j, {}).get(i, np.nan)

  return lu



[*map(get_lu(df_2), df_1.index, df_1.X)]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

Alternative

[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

All together

df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

Uglier version

df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

edited Nov 21 at 17:46

answered Nov 21 at 17:34

piRSquared

150k21135277

`get` and default values

def get_lu(df):

  def lu(i, j):

    return df.get(j, {}).get(i, np.nan)

  return lu



[*map(get_lu(df_2), df_1.index, df_1.X)]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

Alternative

[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]



[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

All together

df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

Uglier version

df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])



   X    P

0  7  NaN

1  2  1.0

2  3  4.0

3  1  4.0

4  1  1.0

5  2  1.0

6  1  3.0

7  3  1.0

8  2  2.0

9  1  2.0

edited Nov 21 at 17:46

answered Nov 21 at 17:34

piRSquared

150k21135277

edited Nov 21 at 17:46

answered Nov 21 at 17:34

piRSquared

150k21135277

answered Nov 21 at 17:34

piRSquared

150k21135277

answered Nov 21 at 17:34

piRSquared

150k21135277

add a comment |

up vote
1
down vote

From the document adding try ...except

result = 

for row, col in zip(df_1.index, df_1.X):

    try :

        result.append(df_2.loc[row, col])

    except :

        result.append(np.nan)



result

Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

answered Nov 21 at 17:29

W-B

94.7k72860

add a comment |

up vote
1
down vote

From the document adding try ...except

result = 

for row, col in zip(df_1.index, df_1.X):

    try :

        result.append(df_2.loc[row, col])

    except :

        result.append(np.nan)



result

Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

answered Nov 21 at 17:29

W-B

94.7k72860

add a comment |

up vote
1
down vote

From the document adding try ...except

result = 

for row, col in zip(df_1.index, df_1.X):

    try :

        result.append(df_2.loc[row, col])

    except :

        result.append(np.nan)



result

Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

answered Nov 21 at 17:29

W-B

94.7k72860

From the document adding try ...except

result = 

for row, col in zip(df_1.index, df_1.X):

    try :

        result.append(df_2.loc[row, col])

    except :

        result.append(np.nan)



result

Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]

answered Nov 21 at 17:29

W-B

94.7k72860

answered Nov 21 at 17:29

W-B

94.7k72860

answered Nov 21 at 17:29

W-B

94.7k72860

answered Nov 21 at 17:29

W-B

94.7k72860

add a comment |

up vote
0
down vote

A tad slower than @piRSquared, but using loc + lambda:

>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)

>> df_1



    X   P

0   7   NaN

1   2   1.0

2   3   4.0

3   1   4.0

4   1   1.0

5   2   1.0

6   1   3.0

7   3   1.0

8   2   2.0

9   1   2.0

edited Nov 21 at 18:04

answered Nov 21 at 17:59

user3471881

1,0491619

add a comment |

up vote
0
down vote

A tad slower than @piRSquared, but using loc + lambda:

>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)

>> df_1



    X   P

0   7   NaN

1   2   1.0

2   3   4.0

3   1   4.0

4   1   1.0

5   2   1.0

6   1   3.0

7   3   1.0

8   2   2.0

9   1   2.0

edited Nov 21 at 18:04

answered Nov 21 at 17:59

user3471881

1,0491619

add a comment |

up vote
0
down vote

A tad slower than @piRSquared, but using loc + lambda:

>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)

>> df_1



    X   P

0   7   NaN

1   2   1.0

2   3   4.0

3   1   4.0

4   1   1.0

5   2   1.0

6   1   3.0

7   3   1.0

8   2   2.0

9   1   2.0

edited Nov 21 at 18:04

answered Nov 21 at 17:59

user3471881

1,0491619

A tad slower than @piRSquared, but using loc + lambda:

>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)

>> df_1



    X   P

0   7   NaN

1   2   1.0

2   3   4.0

3   1   4.0

4   1   1.0

5   2   1.0

6   1   3.0

7   3   1.0

8   2   2.0

9   1   2.0

edited Nov 21 at 18:04

answered Nov 21 at 17:59

user3471881

1,0491619

edited Nov 21 at 18:04

answered Nov 21 at 17:59

user3471881

1,0491619

answered Nov 21 at 17:59

user3471881

1,0491619

answered Nov 21 at 17:59

user3471881

1,0491619

add a comment |

up vote
0
down vote

this answer uses numpy and is fast...

import numpy as np

setup dataframes

df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})



df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],

                      2 : [4,1,2,3,4,1,2,1,2,3],

                      3 : [2,3,4,1,2,3,4,1,2,5]})

# designate working columns

lookup_cols = [1, 2, 3]

key_col = 'X'

result_col = 'P'



# get key column values as an array

key = df_1[key_col].values



# make an array of nans to hold the lookup results

result = np.full(key.shape[0], np.nan)



# create a boolean array containing only valid lookup indexes

b = np.isin(key, lookup_cols)



# filter df_1 and df_2 with boolean array b

df_1b = df_1[b]

df_2b = df_2[b]



# lookup values using filtered dataframes

lup = df_2b.lookup(df_1b.index, df_1b[key_col])

# put the results into the result array at proper index locations using b

result[b] = lup

# assign the result array to the dataframe result column

df_1[result_col] = result

edited Nov 21 at 19:43

answered Nov 21 at 19:31

b2002

536148

add a comment |

up vote
0
down vote

this answer uses numpy and is fast...

import numpy as np

setup dataframes

df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})



df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],

                      2 : [4,1,2,3,4,1,2,1,2,3],

                      3 : [2,3,4,1,2,3,4,1,2,5]})

# designate working columns

lookup_cols = [1, 2, 3]

key_col = 'X'

result_col = 'P'



# get key column values as an array

key = df_1[key_col].values



# make an array of nans to hold the lookup results

result = np.full(key.shape[0], np.nan)



# create a boolean array containing only valid lookup indexes

b = np.isin(key, lookup_cols)



# filter df_1 and df_2 with boolean array b

df_1b = df_1[b]

df_2b = df_2[b]



# lookup values using filtered dataframes

lup = df_2b.lookup(df_1b.index, df_1b[key_col])

# put the results into the result array at proper index locations using b

result[b] = lup

# assign the result array to the dataframe result column

df_1[result_col] = result

edited Nov 21 at 19:43

answered Nov 21 at 19:31

b2002

536148

add a comment |

up vote
0
down vote

this answer uses numpy and is fast...

import numpy as np

setup dataframes

df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})



df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],

                      2 : [4,1,2,3,4,1,2,1,2,3],

                      3 : [2,3,4,1,2,3,4,1,2,5]})

# designate working columns

lookup_cols = [1, 2, 3]

key_col = 'X'

result_col = 'P'



# get key column values as an array

key = df_1[key_col].values



# make an array of nans to hold the lookup results

result = np.full(key.shape[0], np.nan)



# create a boolean array containing only valid lookup indexes

b = np.isin(key, lookup_cols)



# filter df_1 and df_2 with boolean array b

df_1b = df_1[b]

df_2b = df_2[b]



# lookup values using filtered dataframes

lup = df_2b.lookup(df_1b.index, df_1b[key_col])

# put the results into the result array at proper index locations using b

result[b] = lup

# assign the result array to the dataframe result column

df_1[result_col] = result

edited Nov 21 at 19:43

answered Nov 21 at 19:31

b2002

536148

this answer uses numpy and is fast...

import numpy as np

setup dataframes

df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})



df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],

                      2 : [4,1,2,3,4,1,2,1,2,3],

                      3 : [2,3,4,1,2,3,4,1,2,5]})

# designate working columns

lookup_cols = [1, 2, 3]

key_col = 'X'

result_col = 'P'



# get key column values as an array

key = df_1[key_col].values



# make an array of nans to hold the lookup results

result = np.full(key.shape[0], np.nan)



# create a boolean array containing only valid lookup indexes

b = np.isin(key, lookup_cols)



# filter df_1 and df_2 with boolean array b

df_1b = df_1[b]

df_2b = df_2[b]



# lookup values using filtered dataframes

lup = df_2b.lookup(df_1b.index, df_1b[key_col])

# put the results into the result array at proper index locations using b

result[b] = lup

# assign the result array to the dataframe result column

df_1[result_col] = result

edited Nov 21 at 19:43

answered Nov 21 at 19:31

b2002

536148

edited Nov 21 at 19:43

answered Nov 21 at 19:31

b2002

536148

answered Nov 21 at 19:31

b2002

536148

answered Nov 21 at 19:31

b2002

536148

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl