How to add a condition in split apply combine and repeat solution on each row?











up vote
0
down vote

favorite












I have the following pandas dataframe df:



cluster   tag   amount   name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon


What I need to do is add another column in df that writes for each row, the name (from the name column) that has the highest amount where the tag is 1. In other words, the solution looks like this:



cluster   tag   amount   name     highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely


I've tried something like this:



df.group('clusters')['name','amount'].transform('max')[df['tag']==1]


but the problem with this is that the name does note repeat on every row. It will look like this:



cluster   tag   amount   name     highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN


Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?










share|improve this question
























  • I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
    – roganjosh
    Nov 21 at 21:36










  • @roganjosh yes, it will work, it's just like doing df then applying a filer
    – callmeGuy
    Nov 21 at 21:38










  • with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
    – roganjosh
    Nov 21 at 21:40










  • @roganjosh ups, my mistake, without the ".". I've edited my question
    – callmeGuy
    Nov 21 at 21:44















up vote
0
down vote

favorite












I have the following pandas dataframe df:



cluster   tag   amount   name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon


What I need to do is add another column in df that writes for each row, the name (from the name column) that has the highest amount where the tag is 1. In other words, the solution looks like this:



cluster   tag   amount   name     highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely


I've tried something like this:



df.group('clusters')['name','amount'].transform('max')[df['tag']==1]


but the problem with this is that the name does note repeat on every row. It will look like this:



cluster   tag   amount   name     highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN


Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?










share|improve this question
























  • I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
    – roganjosh
    Nov 21 at 21:36










  • @roganjosh yes, it will work, it's just like doing df then applying a filer
    – callmeGuy
    Nov 21 at 21:38










  • with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
    – roganjosh
    Nov 21 at 21:40










  • @roganjosh ups, my mistake, without the ".". I've edited my question
    – callmeGuy
    Nov 21 at 21:44













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have the following pandas dataframe df:



cluster   tag   amount   name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon


What I need to do is add another column in df that writes for each row, the name (from the name column) that has the highest amount where the tag is 1. In other words, the solution looks like this:



cluster   tag   amount   name     highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely


I've tried something like this:



df.group('clusters')['name','amount'].transform('max')[df['tag']==1]


but the problem with this is that the name does note repeat on every row. It will look like this:



cluster   tag   amount   name     highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN


Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?










share|improve this question















I have the following pandas dataframe df:



cluster   tag   amount   name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon


What I need to do is add another column in df that writes for each row, the name (from the name column) that has the highest amount where the tag is 1. In other words, the solution looks like this:



cluster   tag   amount   name     highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely


I've tried something like this:



df.group('clusters')['name','amount'].transform('max')[df['tag']==1]


but the problem with this is that the name does note repeat on every row. It will look like this:



cluster   tag   amount   name     highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN


Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?







python pandas pandas-groupby split-apply-combine






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 at 21:43

























asked Nov 21 at 21:31









callmeGuy

12118




12118












  • I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
    – roganjosh
    Nov 21 at 21:36










  • @roganjosh yes, it will work, it's just like doing df then applying a filer
    – callmeGuy
    Nov 21 at 21:38










  • with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
    – roganjosh
    Nov 21 at 21:40










  • @roganjosh ups, my mistake, without the ".". I've edited my question
    – callmeGuy
    Nov 21 at 21:44


















  • I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
    – roganjosh
    Nov 21 at 21:36










  • @roganjosh yes, it will work, it's just like doing df then applying a filer
    – callmeGuy
    Nov 21 at 21:38










  • with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
    – roganjosh
    Nov 21 at 21:40










  • @roganjosh ups, my mistake, without the ".". I've edited my question
    – callmeGuy
    Nov 21 at 21:44
















I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36




I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36












@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38




@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38












with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40




with the . in df.group('clusters').['name','amount']? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40












@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44




@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










You can do this as a two-stage process. First calculate a mapping series, then map by cluster:



s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']

df['highest_name'] = df['cluster'].map(s)

print(df)

cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely




If you want to use groupby, here's one way:



def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan

df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))





share|improve this answer























  • this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
    – callmeGuy
    Nov 21 at 21:51










  • @callmeGuy, I added a groupby solution, doesn't use transform though.
    – jpp
    Nov 21 at 21:55











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420738%2fhow-to-add-a-condition-in-split-apply-combine-and-repeat-solution-on-each-row%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










You can do this as a two-stage process. First calculate a mapping series, then map by cluster:



s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']

df['highest_name'] = df['cluster'].map(s)

print(df)

cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely




If you want to use groupby, here's one way:



def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan

df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))





share|improve this answer























  • this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
    – callmeGuy
    Nov 21 at 21:51










  • @callmeGuy, I added a groupby solution, doesn't use transform though.
    – jpp
    Nov 21 at 21:55















up vote
1
down vote



accepted










You can do this as a two-stage process. First calculate a mapping series, then map by cluster:



s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']

df['highest_name'] = df['cluster'].map(s)

print(df)

cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely




If you want to use groupby, here's one way:



def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan

df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))





share|improve this answer























  • this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
    – callmeGuy
    Nov 21 at 21:51










  • @callmeGuy, I added a groupby solution, doesn't use transform though.
    – jpp
    Nov 21 at 21:55













up vote
1
down vote



accepted







up vote
1
down vote



accepted






You can do this as a two-stage process. First calculate a mapping series, then map by cluster:



s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']

df['highest_name'] = df['cluster'].map(s)

print(df)

cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely




If you want to use groupby, here's one way:



def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan

df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))





share|improve this answer














You can do this as a two-stage process. First calculate a mapping series, then map by cluster:



s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']

df['highest_name'] = df['cluster'].map(s)

print(df)

cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely




If you want to use groupby, here's one way:



def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan

df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 21 at 21:54

























answered Nov 21 at 21:46









jpp

86.2k194898




86.2k194898












  • this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
    – callmeGuy
    Nov 21 at 21:51










  • @callmeGuy, I added a groupby solution, doesn't use transform though.
    – jpp
    Nov 21 at 21:55


















  • this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
    – callmeGuy
    Nov 21 at 21:51










  • @callmeGuy, I added a groupby solution, doesn't use transform though.
    – jpp
    Nov 21 at 21:55
















this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51




this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51












@callmeGuy, I added a groupby solution, doesn't use transform though.
– jpp
Nov 21 at 21:55




@callmeGuy, I added a groupby solution, doesn't use transform though.
– jpp
Nov 21 at 21:55


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420738%2fhow-to-add-a-condition-in-split-apply-combine-and-repeat-solution-on-each-row%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)