I need to group by and get the rank in python
up vote
0
down vote
favorite
I have a dataframe , refer below code to generate it :
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
Suppose i wanted to group it by card and wanted to know for each card which group code has highest amount ? and create a new dataframe with that card number and group code with highest amount.
Kindly help at the earliest.
python pandas-groupby
add a comment |
up vote
0
down vote
favorite
I have a dataframe , refer below code to generate it :
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
Suppose i wanted to group it by card and wanted to know for each card which group code has highest amount ? and create a new dataframe with that card number and group code with highest amount.
Kindly help at the earliest.
python pandas-groupby
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a dataframe , refer below code to generate it :
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
Suppose i wanted to group it by card and wanted to know for each card which group code has highest amount ? and create a new dataframe with that card number and group code with highest amount.
Kindly help at the earliest.
python pandas-groupby
I have a dataframe , refer below code to generate it :
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
Suppose i wanted to group it by card and wanted to know for each card which group code has highest amount ? and create a new dataframe with that card number and group code with highest amount.
Kindly help at the earliest.
python pandas-groupby
python pandas-groupby
asked 2 days ago
Sheriff
52
52
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
You could do:
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
mask = df.groupby('card')['amount'].transform(max) == df['amount']
result = df[mask][['card', 'group_code', 'amount']]
print(result)
Output
card group_code amount
1 YYY 111 200
6 XXX 333 600
UPDATE
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
agg = df.groupby(['card', 'group_code']).agg({'amount':'sum'}).reset_index()
mask = agg.groupby('card')['amount'].transform(max) == agg['amount']
result = agg[mask]
print(result)
Output
card group_code amount
0 XXX 111 725
2 YYY 111 325
Thanks for helping. But i think we are getting it wrong. In the DF, for the card - XXX we have 2 groups - 111,333. Amount grouped by 111 : 100+400+225 = 725. Amount grouped by 333 : 600. So for card XXX it should Group code 111 and amount 725
– Sheriff
2 days ago
@Sheriff see the update.
– Daniel Mesejo
2 days ago
Great Thanks. I would require bit more here. Instead of getting the Maximum sum . In larger picture, i have a huge huge data set with 14 GB. In that case can you help me in getting the Top 3 Group codes for a particular Card based on the sum of Amount.
– Sheriff
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
You could do:
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
mask = df.groupby('card')['amount'].transform(max) == df['amount']
result = df[mask][['card', 'group_code', 'amount']]
print(result)
Output
card group_code amount
1 YYY 111 200
6 XXX 333 600
UPDATE
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
agg = df.groupby(['card', 'group_code']).agg({'amount':'sum'}).reset_index()
mask = agg.groupby('card')['amount'].transform(max) == agg['amount']
result = agg[mask]
print(result)
Output
card group_code amount
0 XXX 111 725
2 YYY 111 325
Thanks for helping. But i think we are getting it wrong. In the DF, for the card - XXX we have 2 groups - 111,333. Amount grouped by 111 : 100+400+225 = 725. Amount grouped by 333 : 600. So for card XXX it should Group code 111 and amount 725
– Sheriff
2 days ago
@Sheriff see the update.
– Daniel Mesejo
2 days ago
Great Thanks. I would require bit more here. Instead of getting the Maximum sum . In larger picture, i have a huge huge data set with 14 GB. In that case can you help me in getting the Top 3 Group codes for a particular Card based on the sum of Amount.
– Sheriff
2 days ago
add a comment |
up vote
2
down vote
You could do:
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
mask = df.groupby('card')['amount'].transform(max) == df['amount']
result = df[mask][['card', 'group_code', 'amount']]
print(result)
Output
card group_code amount
1 YYY 111 200
6 XXX 333 600
UPDATE
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
agg = df.groupby(['card', 'group_code']).agg({'amount':'sum'}).reset_index()
mask = agg.groupby('card')['amount'].transform(max) == agg['amount']
result = agg[mask]
print(result)
Output
card group_code amount
0 XXX 111 725
2 YYY 111 325
Thanks for helping. But i think we are getting it wrong. In the DF, for the card - XXX we have 2 groups - 111,333. Amount grouped by 111 : 100+400+225 = 725. Amount grouped by 333 : 600. So for card XXX it should Group code 111 and amount 725
– Sheriff
2 days ago
@Sheriff see the update.
– Daniel Mesejo
2 days ago
Great Thanks. I would require bit more here. Instead of getting the Maximum sum . In larger picture, i have a huge huge data set with 14 GB. In that case can you help me in getting the Top 3 Group codes for a particular Card based on the sum of Amount.
– Sheriff
2 days ago
add a comment |
up vote
2
down vote
up vote
2
down vote
You could do:
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
mask = df.groupby('card')['amount'].transform(max) == df['amount']
result = df[mask][['card', 'group_code', 'amount']]
print(result)
Output
card group_code amount
1 YYY 111 200
6 XXX 333 600
UPDATE
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
agg = df.groupby(['card', 'group_code']).agg({'amount':'sum'}).reset_index()
mask = agg.groupby('card')['amount'].transform(max) == agg['amount']
result = agg[mask]
print(result)
Output
card group_code amount
0 XXX 111 725
2 YYY 111 325
You could do:
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
mask = df.groupby('card')['amount'].transform(max) == df['amount']
result = df[mask][['card', 'group_code', 'amount']]
print(result)
Output
card group_code amount
1 YYY 111 200
6 XXX 333 600
UPDATE
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
agg = df.groupby(['card', 'group_code']).agg({'amount':'sum'}).reset_index()
mask = agg.groupby('card')['amount'].transform(max) == agg['amount']
result = agg[mask]
print(result)
Output
card group_code amount
0 XXX 111 725
2 YYY 111 325
edited 2 days ago
answered 2 days ago
Daniel Mesejo
8,2691923
8,2691923
Thanks for helping. But i think we are getting it wrong. In the DF, for the card - XXX we have 2 groups - 111,333. Amount grouped by 111 : 100+400+225 = 725. Amount grouped by 333 : 600. So for card XXX it should Group code 111 and amount 725
– Sheriff
2 days ago
@Sheriff see the update.
– Daniel Mesejo
2 days ago
Great Thanks. I would require bit more here. Instead of getting the Maximum sum . In larger picture, i have a huge huge data set with 14 GB. In that case can you help me in getting the Top 3 Group codes for a particular Card based on the sum of Amount.
– Sheriff
2 days ago
add a comment |
Thanks for helping. But i think we are getting it wrong. In the DF, for the card - XXX we have 2 groups - 111,333. Amount grouped by 111 : 100+400+225 = 725. Amount grouped by 333 : 600. So for card XXX it should Group code 111 and amount 725
– Sheriff
2 days ago
@Sheriff see the update.
– Daniel Mesejo
2 days ago
Great Thanks. I would require bit more here. Instead of getting the Maximum sum . In larger picture, i have a huge huge data set with 14 GB. In that case can you help me in getting the Top 3 Group codes for a particular Card based on the sum of Amount.
– Sheriff
2 days ago
Thanks for helping. But i think we are getting it wrong. In the DF, for the card - XXX we have 2 groups - 111,333. Amount grouped by 111 : 100+400+225 = 725. Amount grouped by 333 : 600. So for card XXX it should Group code 111 and amount 725
– Sheriff
2 days ago
Thanks for helping. But i think we are getting it wrong. In the DF, for the card - XXX we have 2 groups - 111,333. Amount grouped by 111 : 100+400+225 = 725. Amount grouped by 333 : 600. So for card XXX it should Group code 111 and amount 725
– Sheriff
2 days ago
@Sheriff see the update.
– Daniel Mesejo
2 days ago
@Sheriff see the update.
– Daniel Mesejo
2 days ago
Great Thanks. I would require bit more here. Instead of getting the Maximum sum . In larger picture, i have a huge huge data set with 14 GB. In that case can you help me in getting the Top 3 Group codes for a particular Card based on the sum of Amount.
– Sheriff
2 days ago
Great Thanks. I would require bit more here. Instead of getting the Maximum sum . In larger picture, i have a huge huge data set with 14 GB. In that case can you help me in getting the Top 3 Group codes for a particular Card based on the sum of Amount.
– Sheriff
2 days ago
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53410249%2fi-need-to-group-by-and-get-the-rank-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown