How to add a condition in split apply combine and repeat solution on each row?
up vote
0
down vote
favorite
I have the following pandas
dataframe df
:
cluster tag amount name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon
What I need to do is add another column in df
that writes for each row
, the name
(from the name column) that has the highest amount
where the tag
is 1. In other words, the solution looks like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely
I've tried something like this:
df.group('clusters')['name','amount'].transform('max')[df['tag']==1]
but the problem with this is that the name does note repeat on every row. It will look like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN
Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?
python pandas pandas-groupby split-apply-combine
add a comment |
up vote
0
down vote
favorite
I have the following pandas
dataframe df
:
cluster tag amount name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon
What I need to do is add another column in df
that writes for each row
, the name
(from the name column) that has the highest amount
where the tag
is 1. In other words, the solution looks like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely
I've tried something like this:
df.group('clusters')['name','amount'].transform('max')[df['tag']==1]
but the problem with this is that the name does note repeat on every row. It will look like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN
Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?
python pandas pandas-groupby split-apply-combine
I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36
@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38
with the.
indf.group('clusters').['name','amount']
? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40
@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have the following pandas
dataframe df
:
cluster tag amount name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon
What I need to do is add another column in df
that writes for each row
, the name
(from the name column) that has the highest amount
where the tag
is 1. In other words, the solution looks like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely
I've tried something like this:
df.group('clusters')['name','amount'].transform('max')[df['tag']==1]
but the problem with this is that the name does note repeat on every row. It will look like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN
Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?
python pandas pandas-groupby split-apply-combine
I have the following pandas
dataframe df
:
cluster tag amount name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon
What I need to do is add another column in df
that writes for each row
, the name
(from the name column) that has the highest amount
where the tag
is 1. In other words, the solution looks like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely
I've tried something like this:
df.group('clusters')['name','amount'].transform('max')[df['tag']==1]
but the problem with this is that the name does note repeat on every row. It will look like this:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN
Can someone please let me know how to add a condition with split apply combine, and have the solution repeated on each row?
python pandas pandas-groupby split-apply-combine
python pandas pandas-groupby split-apply-combine
edited Nov 21 at 21:43
asked Nov 21 at 21:31
callmeGuy
12118
12118
I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36
@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38
with the.
indf.group('clusters').['name','amount']
? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40
@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44
add a comment |
I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36
@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38
with the.
indf.group('clusters').['name','amount']
? If so, I've learned something new.
– roganjosh
Nov 21 at 21:40
@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44
I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36
I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36
@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38
@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38
with the
.
in df.group('clusters').['name','amount']
? If so, I've learned something new.– roganjosh
Nov 21 at 21:40
with the
.
in df.group('clusters').['name','amount']
? If so, I've learned something new.– roganjosh
Nov 21 at 21:40
@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44
@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You can do this as a two-stage process. First calculate a mapping series, then map by cluster:
s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']
df['highest_name'] = df['cluster'].map(s)
print(df)
cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely
If you want to use groupby
, here's one way:
def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan
df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))
this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51
@callmeGuy, I added agroupby
solution, doesn't usetransform
though.
– jpp
Nov 21 at 21:55
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You can do this as a two-stage process. First calculate a mapping series, then map by cluster:
s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']
df['highest_name'] = df['cluster'].map(s)
print(df)
cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely
If you want to use groupby
, here's one way:
def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan
df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))
this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51
@callmeGuy, I added agroupby
solution, doesn't usetransform
though.
– jpp
Nov 21 at 21:55
add a comment |
up vote
1
down vote
accepted
You can do this as a two-stage process. First calculate a mapping series, then map by cluster:
s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']
df['highest_name'] = df['cluster'].map(s)
print(df)
cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely
If you want to use groupby
, here's one way:
def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan
df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))
this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51
@callmeGuy, I added agroupby
solution, doesn't usetransform
though.
– jpp
Nov 21 at 21:55
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You can do this as a two-stage process. First calculate a mapping series, then map by cluster:
s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']
df['highest_name'] = df['cluster'].map(s)
print(df)
cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely
If you want to use groupby
, here's one way:
def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan
df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))
You can do this as a two-stage process. First calculate a mapping series, then map by cluster:
s = df.query('tag == 1')
.sort_values('amount', ascending=False)
.drop_duplicates('cluster')
.set_index('cluster')['name']
df['highest_name'] = df['cluster'].map(s)
print(df)
cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely
If you want to use groupby
, here's one way:
def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan
df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))
edited Nov 21 at 21:54
answered Nov 21 at 21:46
jpp
86.2k194898
86.2k194898
this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51
@callmeGuy, I added agroupby
solution, doesn't usetransform
though.
– jpp
Nov 21 at 21:55
add a comment |
this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51
@callmeGuy, I added agroupby
solution, doesn't usetransform
though.
– jpp
Nov 21 at 21:55
this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51
this works. I am wondering however if there's a way to do it with group and transform. Do you know if that's even possible?
– callmeGuy
Nov 21 at 21:51
@callmeGuy, I added a
groupby
solution, doesn't use transform
though.– jpp
Nov 21 at 21:55
@callmeGuy, I added a
groupby
solution, doesn't use transform
though.– jpp
Nov 21 at 21:55
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420738%2fhow-to-add-a-condition-in-split-apply-combine-and-repeat-solution-on-each-row%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I'm not sure that's valid pandas syntax? I can't test - does this work, as-posted?
– roganjosh
Nov 21 at 21:36
@roganjosh yes, it will work, it's just like doing df then applying a filer
– callmeGuy
Nov 21 at 21:38
with the
.
indf.group('clusters').['name','amount']
? If so, I've learned something new.– roganjosh
Nov 21 at 21:40
@roganjosh ups, my mistake, without the ".". I've edited my question
– callmeGuy
Nov 21 at 21:44