Pandas: cumulative sum every n rows
I have a dataframe with a column "date" of type dtype M8[ns] and another "expected_response". Then, there is a column "cumulative_expected" which does the cumulative sum of the expected_response among the rows with the same date. The dataframe has a row for each second of the month. Like below:
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-01 0.204 1.198
5 2018-03-01 0.194 1.392
6 2018-03-01 0.190 1.582
... ... ... ...
2678395 2018-03-31 0.164 -7533.464
2678396 2018-03-31 0.164 -7533.300
2678397 2018-03-31 0.160 -7533.140
2678398 2018-03-31 0.154 -7532.986
2678399 2018-03-31 0.150 -7532.836
as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.
The code is:
df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()
Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.
Is there any other solution?
thank you in advance
python pandas
add a comment |
I have a dataframe with a column "date" of type dtype M8[ns] and another "expected_response". Then, there is a column "cumulative_expected" which does the cumulative sum of the expected_response among the rows with the same date. The dataframe has a row for each second of the month. Like below:
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-01 0.204 1.198
5 2018-03-01 0.194 1.392
6 2018-03-01 0.190 1.582
... ... ... ...
2678395 2018-03-31 0.164 -7533.464
2678396 2018-03-31 0.164 -7533.300
2678397 2018-03-31 0.160 -7533.140
2678398 2018-03-31 0.154 -7532.986
2678399 2018-03-31 0.150 -7532.836
as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.
The code is:
df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()
Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.
Is there any other solution?
thank you in advance
python pandas
add a comment |
I have a dataframe with a column "date" of type dtype M8[ns] and another "expected_response". Then, there is a column "cumulative_expected" which does the cumulative sum of the expected_response among the rows with the same date. The dataframe has a row for each second of the month. Like below:
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-01 0.204 1.198
5 2018-03-01 0.194 1.392
6 2018-03-01 0.190 1.582
... ... ... ...
2678395 2018-03-31 0.164 -7533.464
2678396 2018-03-31 0.164 -7533.300
2678397 2018-03-31 0.160 -7533.140
2678398 2018-03-31 0.154 -7532.986
2678399 2018-03-31 0.150 -7532.836
as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.
The code is:
df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()
Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.
Is there any other solution?
thank you in advance
python pandas
I have a dataframe with a column "date" of type dtype M8[ns] and another "expected_response". Then, there is a column "cumulative_expected" which does the cumulative sum of the expected_response among the rows with the same date. The dataframe has a row for each second of the month. Like below:
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-01 0.204 1.198
5 2018-03-01 0.194 1.392
6 2018-03-01 0.190 1.582
... ... ... ...
2678395 2018-03-31 0.164 -7533.464
2678396 2018-03-31 0.164 -7533.300
2678397 2018-03-31 0.160 -7533.140
2678398 2018-03-31 0.154 -7532.986
2678399 2018-03-31 0.150 -7532.836
as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.
The code is:
df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()
Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.
Is there any other solution?
thank you in advance
python pandas
python pandas
asked Nov 28 '18 at 9:34
Luca91Luca91
1858
1858
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
There is default index, so you can use floor division:
df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()
Generally solution is create np.arange
with floor division:
arr = np.arange(len(df)) // 86400
df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()
Your solution should be changed with comparing shift
ed values with cumsum
:
s = (df['date']!=df['date'].shift()).cumsum()
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
Test with changed sample data:
print (df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-02 0.240
3 2018-03-02 0.224
4 2018-03-02 0.204
5 2018-03-01 0.194
6 2018-03-01 0.190
s = (df['date']!=df['date'].shift()).cumsum()
print (s)
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Name: date, dtype: int32
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
print (df)
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-02 0.240 0.240
3 2018-03-02 0.224 0.464
4 2018-03-02 0.204 0.668
5 2018-03-01 0.194 0.194
6 2018-03-01 0.190 0.384
add a comment |
You can take the first difference of the date using diff
to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df
to see how works:
print(df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-01 0.240
3 2018-03-01 0.224
4 2018-03-02 0.204
5 2018-03-02 0.194
6 2018-03-02 0.190
df['change'] = df.date.diff().abs().fillna(0).cumsum()
print(df)
date Expected_response change
0 2018-03-01 0.270 0 days
1 2018-03-01 0.260 0 days
2 2018-03-01 0.240 0 days
3 2018-03-01 0.224 0 days
4 2018-03-02 0.204 1 days
5 2018-03-02 0.194 1 days
6 2018-03-02 0.190 1 days
df['cumulative_expected'] = df.groupby('change').cumsum()
print(df.drop(['change'], axis = 1))
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-02 0.204 0.204
5 2018-03-02 0.194 0.398
6 2018-03-02 0.190 0.588
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53516252%2fpandas-cumulative-sum-every-n-rows%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
There is default index, so you can use floor division:
df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()
Generally solution is create np.arange
with floor division:
arr = np.arange(len(df)) // 86400
df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()
Your solution should be changed with comparing shift
ed values with cumsum
:
s = (df['date']!=df['date'].shift()).cumsum()
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
Test with changed sample data:
print (df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-02 0.240
3 2018-03-02 0.224
4 2018-03-02 0.204
5 2018-03-01 0.194
6 2018-03-01 0.190
s = (df['date']!=df['date'].shift()).cumsum()
print (s)
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Name: date, dtype: int32
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
print (df)
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-02 0.240 0.240
3 2018-03-02 0.224 0.464
4 2018-03-02 0.204 0.668
5 2018-03-01 0.194 0.194
6 2018-03-01 0.190 0.384
add a comment |
There is default index, so you can use floor division:
df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()
Generally solution is create np.arange
with floor division:
arr = np.arange(len(df)) // 86400
df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()
Your solution should be changed with comparing shift
ed values with cumsum
:
s = (df['date']!=df['date'].shift()).cumsum()
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
Test with changed sample data:
print (df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-02 0.240
3 2018-03-02 0.224
4 2018-03-02 0.204
5 2018-03-01 0.194
6 2018-03-01 0.190
s = (df['date']!=df['date'].shift()).cumsum()
print (s)
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Name: date, dtype: int32
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
print (df)
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-02 0.240 0.240
3 2018-03-02 0.224 0.464
4 2018-03-02 0.204 0.668
5 2018-03-01 0.194 0.194
6 2018-03-01 0.190 0.384
add a comment |
There is default index, so you can use floor division:
df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()
Generally solution is create np.arange
with floor division:
arr = np.arange(len(df)) // 86400
df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()
Your solution should be changed with comparing shift
ed values with cumsum
:
s = (df['date']!=df['date'].shift()).cumsum()
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
Test with changed sample data:
print (df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-02 0.240
3 2018-03-02 0.224
4 2018-03-02 0.204
5 2018-03-01 0.194
6 2018-03-01 0.190
s = (df['date']!=df['date'].shift()).cumsum()
print (s)
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Name: date, dtype: int32
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
print (df)
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-02 0.240 0.240
3 2018-03-02 0.224 0.464
4 2018-03-02 0.204 0.668
5 2018-03-01 0.194 0.194
6 2018-03-01 0.190 0.384
There is default index, so you can use floor division:
df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()
Generally solution is create np.arange
with floor division:
arr = np.arange(len(df)) // 86400
df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()
Your solution should be changed with comparing shift
ed values with cumsum
:
s = (df['date']!=df['date'].shift()).cumsum()
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
Test with changed sample data:
print (df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-02 0.240
3 2018-03-02 0.224
4 2018-03-02 0.204
5 2018-03-01 0.194
6 2018-03-01 0.190
s = (df['date']!=df['date'].shift()).cumsum()
print (s)
0 1
1 1
2 2
3 2
4 2
5 3
6 3
Name: date, dtype: int32
df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
print (df)
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-02 0.240 0.240
3 2018-03-02 0.224 0.464
4 2018-03-02 0.204 0.668
5 2018-03-01 0.194 0.194
6 2018-03-01 0.190 0.384
edited Nov 28 '18 at 9:53
answered Nov 28 '18 at 9:36
jezraeljezrael
349k26310384
349k26310384
add a comment |
add a comment |
You can take the first difference of the date using diff
to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df
to see how works:
print(df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-01 0.240
3 2018-03-01 0.224
4 2018-03-02 0.204
5 2018-03-02 0.194
6 2018-03-02 0.190
df['change'] = df.date.diff().abs().fillna(0).cumsum()
print(df)
date Expected_response change
0 2018-03-01 0.270 0 days
1 2018-03-01 0.260 0 days
2 2018-03-01 0.240 0 days
3 2018-03-01 0.224 0 days
4 2018-03-02 0.204 1 days
5 2018-03-02 0.194 1 days
6 2018-03-02 0.190 1 days
df['cumulative_expected'] = df.groupby('change').cumsum()
print(df.drop(['change'], axis = 1))
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-02 0.204 0.204
5 2018-03-02 0.194 0.398
6 2018-03-02 0.190 0.588
add a comment |
You can take the first difference of the date using diff
to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df
to see how works:
print(df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-01 0.240
3 2018-03-01 0.224
4 2018-03-02 0.204
5 2018-03-02 0.194
6 2018-03-02 0.190
df['change'] = df.date.diff().abs().fillna(0).cumsum()
print(df)
date Expected_response change
0 2018-03-01 0.270 0 days
1 2018-03-01 0.260 0 days
2 2018-03-01 0.240 0 days
3 2018-03-01 0.224 0 days
4 2018-03-02 0.204 1 days
5 2018-03-02 0.194 1 days
6 2018-03-02 0.190 1 days
df['cumulative_expected'] = df.groupby('change').cumsum()
print(df.drop(['change'], axis = 1))
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-02 0.204 0.204
5 2018-03-02 0.194 0.398
6 2018-03-02 0.190 0.588
add a comment |
You can take the first difference of the date using diff
to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df
to see how works:
print(df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-01 0.240
3 2018-03-01 0.224
4 2018-03-02 0.204
5 2018-03-02 0.194
6 2018-03-02 0.190
df['change'] = df.date.diff().abs().fillna(0).cumsum()
print(df)
date Expected_response change
0 2018-03-01 0.270 0 days
1 2018-03-01 0.260 0 days
2 2018-03-01 0.240 0 days
3 2018-03-01 0.224 0 days
4 2018-03-02 0.204 1 days
5 2018-03-02 0.194 1 days
6 2018-03-02 0.190 1 days
df['cumulative_expected'] = df.groupby('change').cumsum()
print(df.drop(['change'], axis = 1))
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-02 0.204 0.204
5 2018-03-02 0.194 0.398
6 2018-03-02 0.190 0.588
You can take the first difference of the date using diff
to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df
to see how works:
print(df)
date Expected_response
0 2018-03-01 0.270
1 2018-03-01 0.260
2 2018-03-01 0.240
3 2018-03-01 0.224
4 2018-03-02 0.204
5 2018-03-02 0.194
6 2018-03-02 0.190
df['change'] = df.date.diff().abs().fillna(0).cumsum()
print(df)
date Expected_response change
0 2018-03-01 0.270 0 days
1 2018-03-01 0.260 0 days
2 2018-03-01 0.240 0 days
3 2018-03-01 0.224 0 days
4 2018-03-02 0.204 1 days
5 2018-03-02 0.194 1 days
6 2018-03-02 0.190 1 days
df['cumulative_expected'] = df.groupby('change').cumsum()
print(df.drop(['change'], axis = 1))
date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-02 0.204 0.204
5 2018-03-02 0.194 0.398
6 2018-03-02 0.190 0.588
edited Nov 28 '18 at 10:23
answered Nov 28 '18 at 9:45
yatuyatu
13.8k31441
13.8k31441
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53516252%2fpandas-cumulative-sum-every-n-rows%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown