How to add a condition in split apply combine and repeat solution on each row?

up vote
0
down vote

favorite

I have the following pandas dataframe df:

cluster   tag   amount   name

1         0     200      Michael        

2         1     1200     John        

2         1     900      Daniel        

2         0     3000     David        

2         0     600      Jonny        

3         0     900      Denisse        

3         1     900      Mike        

3         1     3000     Kely        

3         0     2000     Devon

What I need to do is add another column in df that writes for each row, the name (from the name column) that has the highest amount where the tag is 1. In other words, the solution looks like this:

cluster   tag   amount   name     highest_amount

1         0     200      Michael  NaN      

2         1     1200     John     John   

2         1     900      Daniel   John     

2         0     3000     David    John    

2         0     600      Jonny    John    

3         0     900      Denisse  Kely      

3         1     900      Mike     Kely   

3         1     3000     Kely     Kely   

3         0     2000     Devon    Kely

I've tried something like this:

df.group('clusters')['name','amount'].transform('max')[df['tag']==1]

but the problem with this is that the name does note repeat on every row. It will look like this:

cluster   tag   amount   name     highest_amount

1         0     200      Michael  NaN      

2         1     1200     John     John   

2         1     900      Daniel   John     

2         0     3000     David    NaN    

2         0     600      Jonny    NaN    

3         0     900      Denisse  NaN      

3         1     900      Mike     Kely   

3         1     3000     Kely     Kely   

3         0     2000     Devon    NaN

Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?

edited Nov 21 at 21:43

asked Nov 21 at 21:31

callmeGuy

12118

I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36

@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38

with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40

@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44

add a comment |

up vote
0
down vote

favorite

I have the following pandas dataframe df:

cluster   tag   amount   name

1         0     200      Michael        

2         1     1200     John        

2         1     900      Daniel        

2         0     3000     David        

2         0     600      Jonny        

3         0     900      Denisse        

3         1     900      Mike        

3         1     3000     Kely        

3         0     2000     Devon

cluster   tag   amount   name     highest_amount

1         0     200      Michael  NaN      

2         1     1200     John     John   

2         1     900      Daniel   John     

2         0     3000     David    John    

2         0     600      Jonny    John    

3         0     900      Denisse  Kely      

3         1     900      Mike     Kely   

3         1     3000     Kely     Kely   

3         0     2000     Devon    Kely

I've tried something like this:

df.group('clusters')['name','amount'].transform('max')[df['tag']==1]

but the problem with this is that the name does note repeat on every row. It will look like this:

cluster   tag   amount   name     highest_amount

1         0     200      Michael  NaN      

2         1     1200     John     John   

2         1     900      Daniel   John     

2         0     3000     David    NaN    

2         0     600      Jonny    NaN    

3         0     900      Denisse  NaN      

3         1     900      Mike     Kely   

3         1     3000     Kely     Kely   

3         0     2000     Devon    NaN

Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?

edited Nov 21 at 21:43

asked Nov 21 at 21:31

callmeGuy

12118

I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36

@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38

with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40

@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44

add a comment |

up vote
0
down vote

favorite

I have the following pandas dataframe df:

cluster   tag   amount   name

1         0     200      Michael        

2         1     1200     John        

2         1     900      Daniel        

2         0     3000     David        

2         0     600      Jonny        

3         0     900      Denisse        

3         1     900      Mike        

3         1     3000     Kely        

3         0     2000     Devon

cluster   tag   amount   name     highest_amount

1         0     200      Michael  NaN      

2         1     1200     John     John   

2         1     900      Daniel   John     

2         0     3000     David    John    

2         0     600      Jonny    John    

3         0     900      Denisse  Kely      

3         1     900      Mike     Kely   

3         1     3000     Kely     Kely   

3         0     2000     Devon    Kely

I've tried something like this:

df.group('clusters')['name','amount'].transform('max')[df['tag']==1]

but the problem with this is that the name does note repeat on every row. It will look like this:

cluster   tag   amount   name     highest_amount

1         0     200      Michael  NaN      

2         1     1200     John     John   

2         1     900      Daniel   John     

2         0     3000     David    NaN    

2         0     600      Jonny    NaN    

3         0     900      Denisse  NaN      

3         1     900      Mike     Kely   

3         1     3000     Kely     Kely   

3         0     2000     Devon    NaN

Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?

edited Nov 21 at 21:43

asked Nov 21 at 21:31

callmeGuy

12118

I have the following pandas dataframe df:

cluster   tag   amount   name

1         0     200      Michael        

2         1     1200     John        

2         1     900      Daniel        

2         0     3000     David        

2         0     600      Jonny        

3         0     900      Denisse        

3         1     900      Mike        

3         1     3000     Kely        

3         0     2000     Devon

cluster   tag   amount   name     highest_amount

1         0     200      Michael  NaN      

2         1     1200     John     John   

2         1     900      Daniel   John     

2         0     3000     David    John    

2         0     600      Jonny    John    

3         0     900      Denisse  Kely      

3         1     900      Mike     Kely   

3         1     3000     Kely     Kely   

3         0     2000     Devon    Kely

I've tried something like this:

df.group('clusters')['name','amount'].transform('max')[df['tag']==1]

but the problem with this is that the name does note repeat on every row. It will look like this:

cluster   tag   amount   name     highest_amount

1         0     200      Michael  NaN      

2         1     1200     John     John   

2         1     900      Daniel   John     

2         0     3000     David    NaN    

2         0     600      Jonny    NaN    

3         0     900      Denisse  NaN      

3         1     900      Mike     Kely   

3         1     3000     Kely     Kely   

3         0     2000     Devon    NaN

Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?

python pandas pandas-groupby split-apply-combine

edited Nov 21 at 21:43

asked Nov 21 at 21:31

callmeGuy

12118

edited Nov 21 at 21:43

asked Nov 21 at 21:31

callmeGuy

12118

edited Nov 21 at 21:43

asked Nov 21 at 21:31

callmeGuy

12118

asked Nov 21 at 21:31

callmeGuy

12118

asked Nov 21 at 21:31

callmeGuy

12118

I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36

@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38

with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40

@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44

add a comment |

I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36

@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38

with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40

@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44

I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36

@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38

with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40

@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can do this as a two-stage process. First calculate a mapping series, then map by cluster:

s = df.query('tag == 1')

      .sort_values('amount', ascending=False)

      .drop_duplicates('cluster')

      .set_index('cluster')['name']



df['highest_name'] = df['cluster'].map(s)



print(df)



   cluster  tag  amount     name highest_name

0        1    0     200  Michael          NaN

1        2    1    1200     John         John

2        2    1     900   Daniel         John

3        2    0    3000    David         John

4        2    0     600    Jonny         John

5        3    0     900  Denisse         Kely

6        3    1     900     Mike         Kely

7        3    1    3000     Kely         Kely

8        3    0    2000    Devon         Kely

If you want to use groupby, here's one way:

def func(x):

    names = x.query('tag == 1').sort_values('amount', ascending=False)['name']

    return names.iloc[0] if not names.empty else np.nan



df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))

edited Nov 21 at 21:54

answered Nov 21 at 21:46

jpp

86.2k194898

this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51

@callmeGuy, I added a groupby solution, doesn't use transform though.
– jpp
Nov 21 at 21:55

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420738%2fhow-to-add-a-condition-in-split-apply-combine-and-repeat-solution-on-each-row%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can do this as a two-stage process. First calculate a mapping series, then map by cluster:

s = df.query('tag == 1')

      .sort_values('amount', ascending=False)

      .drop_duplicates('cluster')

      .set_index('cluster')['name']



df['highest_name'] = df['cluster'].map(s)



print(df)



   cluster  tag  amount     name highest_name

0        1    0     200  Michael          NaN

1        2    1    1200     John         John

2        2    1     900   Daniel         John

3        2    0    3000    David         John

4        2    0     600    Jonny         John

5        3    0     900  Denisse         Kely

6        3    1     900     Mike         Kely

7        3    1    3000     Kely         Kely

8        3    0    2000    Devon         Kely

If you want to use groupby, here's one way:

def func(x):

    names = x.query('tag == 1').sort_values('amount', ascending=False)['name']

    return names.iloc[0] if not names.empty else np.nan



df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))

edited Nov 21 at 21:54

answered Nov 21 at 21:46

jpp

86.2k194898

this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51

@callmeGuy, I added a groupby solution, doesn't use transform though.
– jpp
Nov 21 at 21:55

add a comment |

up vote
1
down vote

accepted

You can do this as a two-stage process. First calculate a mapping series, then map by cluster:

s = df.query('tag == 1')

      .sort_values('amount', ascending=False)

      .drop_duplicates('cluster')

      .set_index('cluster')['name']



df['highest_name'] = df['cluster'].map(s)



print(df)



   cluster  tag  amount     name highest_name

0        1    0     200  Michael          NaN

1        2    1    1200     John         John

2        2    1     900   Daniel         John

3        2    0    3000    David         John

4        2    0     600    Jonny         John

5        3    0     900  Denisse         Kely

6        3    1     900     Mike         Kely

7        3    1    3000     Kely         Kely

8        3    0    2000    Devon         Kely

If you want to use groupby, here's one way:

def func(x):

    names = x.query('tag == 1').sort_values('amount', ascending=False)['name']

    return names.iloc[0] if not names.empty else np.nan



df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))

edited Nov 21 at 21:54

answered Nov 21 at 21:46

jpp

86.2k194898

this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51

@callmeGuy, I added a groupby solution, doesn't use transform though.
– jpp
Nov 21 at 21:55

add a comment |

up vote
1
down vote

accepted

You can do this as a two-stage process. First calculate a mapping series, then map by cluster:

s = df.query('tag == 1')

      .sort_values('amount', ascending=False)

      .drop_duplicates('cluster')

      .set_index('cluster')['name']



df['highest_name'] = df['cluster'].map(s)



print(df)



   cluster  tag  amount     name highest_name

0        1    0     200  Michael          NaN

1        2    1    1200     John         John

2        2    1     900   Daniel         John

3        2    0    3000    David         John

4        2    0     600    Jonny         John

5        3    0     900  Denisse         Kely

6        3    1     900     Mike         Kely

7        3    1    3000     Kely         Kely

8        3    0    2000    Devon         Kely

If you want to use groupby, here's one way:

def func(x):

    names = x.query('tag == 1').sort_values('amount', ascending=False)['name']

    return names.iloc[0] if not names.empty else np.nan



df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))

edited Nov 21 at 21:54

answered Nov 21 at 21:46

jpp

86.2k194898

You can do this as a two-stage process. First calculate a mapping series, then map by cluster:

s = df.query('tag == 1')

      .sort_values('amount', ascending=False)

      .drop_duplicates('cluster')

      .set_index('cluster')['name']



df['highest_name'] = df['cluster'].map(s)



print(df)



   cluster  tag  amount     name highest_name

0        1    0     200  Michael          NaN

1        2    1    1200     John         John

2        2    1     900   Daniel         John

3        2    0    3000    David         John

4        2    0     600    Jonny         John

5        3    0     900  Denisse         Kely

6        3    1     900     Mike         Kely

7        3    1    3000     Kely         Kely

8        3    0    2000    Devon         Kely

If you want to use groupby, here's one way:

def func(x):

    names = x.query('tag == 1').sort_values('amount', ascending=False)['name']

    return names.iloc[0] if not names.empty else np.nan



df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))

edited Nov 21 at 21:54

answered Nov 21 at 21:46

jpp

86.2k194898

edited Nov 21 at 21:54

answered Nov 21 at 21:46

jpp

86.2k194898

answered Nov 21 at 21:46

jpp

86.2k194898

answered Nov 21 at 21:46

jpp

86.2k194898

this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51

@callmeGuy, I added a groupby solution, doesn't use transform though.
– jpp
Nov 21 at 21:55

add a comment |

this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51

@callmeGuy, I added a groupby solution, doesn't use transform though.
– jpp
Nov 21 at 21:55

this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51

@callmeGuy, I added a groupby solution, doesn't use transform though.
– jpp
Nov 21 at 21:55

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl