How to use Pandas to get the count of every combination inclusive
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.
For example, I have:
Cust_num Item Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40
This should result in:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
The best I can do is unique combinations:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1
I tried:
df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')
But that is just the unique counts.
python pandas
New contributor
add a comment |
I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.
For example, I have:
Cust_num Item Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40
This should result in:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
The best I can do is unique combinations:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1
I tried:
df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')
But that is just the unique counts.
python pandas
New contributor
2
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
1 hour ago
add a comment |
I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.
For example, I have:
Cust_num Item Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40
This should result in:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
The best I can do is unique combinations:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1
I tried:
df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')
But that is just the unique counts.
python pandas
New contributor
I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.
For example, I have:
Cust_num Item Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40
This should result in:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
The best I can do is unique combinations:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1
I tried:
df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')
But that is just the unique counts.
python pandas
python pandas
New contributor
New contributor
New contributor
asked 2 hours ago
Taylor SmithTaylor Smith
442
442
New contributor
New contributor
2
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
1 hour ago
add a comment |
2
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
1 hour ago
2
2
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
1 hour ago
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
1 hour ago
add a comment |
4 Answers
4
active
oldest
votes
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
add a comment |
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
add a comment |
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
Then get the unique items.
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
add a comment |
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
How is thecount
in your answer the count of inclusive combination ofdf['Item']
? Making new column withrange
is not an answer.
– Chris
44 mins ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Taylor Smith is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55565916%2fhow-to-use-pandas-to-get-the-count-of-every-combination-inclusive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
add a comment |
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
add a comment |
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
answered 1 hour ago
ChrisChris
3,710422
3,710422
add a comment |
add a comment |
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
add a comment |
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
add a comment |
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
edited 42 mins ago
answered 1 hour ago
Pedro LobitoPedro Lobito
50.5k16138172
50.5k16138172
add a comment |
add a comment |
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
Then get the unique items.
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
add a comment |
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
Then get the unique items.
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
add a comment |
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
Then get the unique items.
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
Then get the unique items.
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
answered 1 hour ago
ResidentSleeperResidentSleeper
36210
36210
add a comment |
add a comment |
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
How is thecount
in your answer the count of inclusive combination ofdf['Item']
? Making new column withrange
is not an answer.
– Chris
44 mins ago
add a comment |
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
How is thecount
in your answer the count of inclusive combination ofdf['Item']
? Making new column withrange
is not an answer.
– Chris
44 mins ago
add a comment |
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
edited 41 mins ago
answered 51 mins ago
Lee MtotiLee Mtoti
13110
13110
How is thecount
in your answer the count of inclusive combination ofdf['Item']
? Making new column withrange
is not an answer.
– Chris
44 mins ago
add a comment |
How is thecount
in your answer the count of inclusive combination ofdf['Item']
? Making new column withrange
is not an answer.
– Chris
44 mins ago
How is the
count
in your answer the count of inclusive combination of df['Item']
? Making new column with range
is not an answer.– Chris
44 mins ago
How is the
count
in your answer the count of inclusive combination of df['Item']
? Making new column with range
is not an answer.– Chris
44 mins ago
add a comment |
Taylor Smith is a new contributor. Be nice, and check out our Code of Conduct.
Taylor Smith is a new contributor. Be nice, and check out our Code of Conduct.
Taylor Smith is a new contributor. Be nice, and check out our Code of Conduct.
Taylor Smith is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55565916%2fhow-to-use-pandas-to-get-the-count-of-every-combination-inclusive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
1 hour ago