Pandas: cumulative sum every n rows

I have a dataframe with a column "date" of type dtype M8[ns] and another "expected_response". Then, there is a column "cumulative_expected" which does the cumulative sum of the expected_response among the rows with the same date. The dataframe has a row for each second of the month. Like below:

               date Expected_response cumulative_expected

       0    2018-03-01  0.270   0.270

       1    2018-03-01  0.260   0.530

       2    2018-03-01  0.240   0.770

       3    2018-03-01  0.224   0.994

       4    2018-03-01  0.204   1.198

       5    2018-03-01  0.194   1.392

       6    2018-03-01  0.190   1.582

       ...  ...     ...     ...

2678395     2018-03-31  0.164   -7533.464

2678396     2018-03-31  0.164   -7533.300

2678397     2018-03-31  0.160   -7533.140

2678398     2018-03-31  0.154   -7532.986

2678399     2018-03-31  0.150   -7532.836

as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.

The code is:

df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()

Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.

Is there any other solution?
thank you in advance

asked Nov 28 '18 at 9:34

Luca91

1858

add a comment |

               date Expected_response cumulative_expected

       0    2018-03-01  0.270   0.270

       1    2018-03-01  0.260   0.530

       2    2018-03-01  0.240   0.770

       3    2018-03-01  0.224   0.994

       4    2018-03-01  0.204   1.198

       5    2018-03-01  0.194   1.392

       6    2018-03-01  0.190   1.582

       ...  ...     ...     ...

2678395     2018-03-31  0.164   -7533.464

2678396     2018-03-31  0.164   -7533.300

2678397     2018-03-31  0.160   -7533.140

2678398     2018-03-31  0.154   -7532.986

2678399     2018-03-31  0.150   -7532.836

as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.

The code is:

df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()

Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.

Is there any other solution?
thank you in advance

asked Nov 28 '18 at 9:34

Luca91

1858

add a comment |

               date Expected_response cumulative_expected

       0    2018-03-01  0.270   0.270

       1    2018-03-01  0.260   0.530

       2    2018-03-01  0.240   0.770

       3    2018-03-01  0.224   0.994

       4    2018-03-01  0.204   1.198

       5    2018-03-01  0.194   1.392

       6    2018-03-01  0.190   1.582

       ...  ...     ...     ...

2678395     2018-03-31  0.164   -7533.464

2678396     2018-03-31  0.164   -7533.300

2678397     2018-03-31  0.160   -7533.140

2678398     2018-03-31  0.154   -7532.986

2678399     2018-03-31  0.150   -7532.836

as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.

The code is:

df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()

Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.

Is there any other solution?
thank you in advance

asked Nov 28 '18 at 9:34

Luca91

1858

               date Expected_response cumulative_expected

       0    2018-03-01  0.270   0.270

       1    2018-03-01  0.260   0.530

       2    2018-03-01  0.240   0.770

       3    2018-03-01  0.224   0.994

       4    2018-03-01  0.204   1.198

       5    2018-03-01  0.194   1.392

       6    2018-03-01  0.190   1.582

       ...  ...     ...     ...

2678395     2018-03-31  0.164   -7533.464

2678396     2018-03-31  0.164   -7533.300

2678397     2018-03-31  0.160   -7533.140

2678398     2018-03-31  0.154   -7532.986

2678399     2018-03-31  0.150   -7532.836

as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.

The code is:

df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()

Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.

Is there any other solution?
thank you in advance

python pandas

asked Nov 28 '18 at 9:34

Luca91

1858

asked Nov 28 '18 at 9:34

Luca91

1858

asked Nov 28 '18 at 9:34

Luca91

1858

asked Nov 28 '18 at 9:34

Luca91

1858

asked Nov 28 '18 at 9:34

Luca91

1858

add a comment |

2 Answers
2

active

oldest

votes

There is default index, so you can use floor division:

df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()

Generally solution is create np.arange with floor division:

arr = np.arange(len(df)) // 86400

df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()

Your solution should be changed with comparing shifted values with cumsum:

s = (df['date']!=df['date'].shift()).cumsum()

df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

Test with changed sample data:

print (df)



         date  Expected_response

0  2018-03-01              0.270

1  2018-03-01              0.260

2  2018-03-02              0.240

3  2018-03-02              0.224

4  2018-03-02              0.204

5  2018-03-01              0.194

6  2018-03-01              0.190



s = (df['date']!=df['date'].shift()).cumsum()

print (s)

0    1

1    1

2    2

3    2

4    2

5    3

6    3

Name: date, dtype: int32



df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

print (df)

        date  Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-02              0.240                0.240

3 2018-03-02              0.224                0.464

4 2018-03-02              0.204                0.668

5 2018-03-01              0.194                0.194

6 2018-03-01              0.190                0.384

edited Nov 28 '18 at 9:53

answered Nov 28 '18 at 9:36

jezrael

349k26310384

add a comment |

You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df to see how works:

print(df)



     date          Expected_response  

0 2018-03-01              0.270                

1 2018-03-01              0.260                

2 2018-03-01              0.240                

3 2018-03-01              0.224               

4 2018-03-02              0.204                 

5 2018-03-02              0.194                

6 2018-03-02              0.190                



df['change'] = df.date.diff().abs().fillna(0).cumsum()

print(df)



    date    Expected_response   change

0 2018-03-01              0.270 0 days

1 2018-03-01              0.260 0 days

2 2018-03-01              0.240 0 days

3 2018-03-01              0.224 0 days

4 2018-03-02              0.204 1 days

5 2018-03-02              0.194 1 days

6 2018-03-02              0.190 1 days



df['cumulative_expected'] = df.groupby('change').cumsum()

print(df.drop(['change'], axis = 1))



    date          Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-01              0.240                0.770

3 2018-03-01              0.224                0.994

4 2018-03-02              0.204                0.204

5 2018-03-02              0.194                0.398

6 2018-03-02              0.190                0.588

edited Nov 28 '18 at 10:23

answered Nov 28 '18 at 9:45

yatu

13.8k31441

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53516252%2fpandas-cumulative-sum-every-n-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

There is default index, so you can use floor division:

df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()

Generally solution is create np.arange with floor division:

arr = np.arange(len(df)) // 86400

df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()

Your solution should be changed with comparing shifted values with cumsum:

s = (df['date']!=df['date'].shift()).cumsum()

df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

Test with changed sample data:

print (df)



         date  Expected_response

0  2018-03-01              0.270

1  2018-03-01              0.260

2  2018-03-02              0.240

3  2018-03-02              0.224

4  2018-03-02              0.204

5  2018-03-01              0.194

6  2018-03-01              0.190



s = (df['date']!=df['date'].shift()).cumsum()

print (s)

0    1

1    1

2    2

3    2

4    2

5    3

6    3

Name: date, dtype: int32



df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

print (df)

        date  Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-02              0.240                0.240

3 2018-03-02              0.224                0.464

4 2018-03-02              0.204                0.668

5 2018-03-01              0.194                0.194

6 2018-03-01              0.190                0.384

edited Nov 28 '18 at 9:53

answered Nov 28 '18 at 9:36

jezrael

349k26310384

add a comment |

There is default index, so you can use floor division:

df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()

Generally solution is create np.arange with floor division:

arr = np.arange(len(df)) // 86400

df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()

Your solution should be changed with comparing shifted values with cumsum:

s = (df['date']!=df['date'].shift()).cumsum()

df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

Test with changed sample data:

print (df)



         date  Expected_response

0  2018-03-01              0.270

1  2018-03-01              0.260

2  2018-03-02              0.240

3  2018-03-02              0.224

4  2018-03-02              0.204

5  2018-03-01              0.194

6  2018-03-01              0.190



s = (df['date']!=df['date'].shift()).cumsum()

print (s)

0    1

1    1

2    2

3    2

4    2

5    3

6    3

Name: date, dtype: int32



df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

print (df)

        date  Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-02              0.240                0.240

3 2018-03-02              0.224                0.464

4 2018-03-02              0.204                0.668

5 2018-03-01              0.194                0.194

6 2018-03-01              0.190                0.384

edited Nov 28 '18 at 9:53

answered Nov 28 '18 at 9:36

jezrael

349k26310384

add a comment |

There is default index, so you can use floor division:

df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()

Generally solution is create np.arange with floor division:

arr = np.arange(len(df)) // 86400

df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()

Your solution should be changed with comparing shifted values with cumsum:

s = (df['date']!=df['date'].shift()).cumsum()

df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

Test with changed sample data:

print (df)



         date  Expected_response

0  2018-03-01              0.270

1  2018-03-01              0.260

2  2018-03-02              0.240

3  2018-03-02              0.224

4  2018-03-02              0.204

5  2018-03-01              0.194

6  2018-03-01              0.190



s = (df['date']!=df['date'].shift()).cumsum()

print (s)

0    1

1    1

2    2

3    2

4    2

5    3

6    3

Name: date, dtype: int32



df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

print (df)

        date  Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-02              0.240                0.240

3 2018-03-02              0.224                0.464

4 2018-03-02              0.204                0.668

5 2018-03-01              0.194                0.194

6 2018-03-01              0.190                0.384

edited Nov 28 '18 at 9:53

answered Nov 28 '18 at 9:36

jezrael

349k26310384

There is default index, so you can use floor division:

df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()

Generally solution is create np.arange with floor division:

arr = np.arange(len(df)) // 86400

df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()

Your solution should be changed with comparing shifted values with cumsum:

s = (df['date']!=df['date'].shift()).cumsum()

df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

Test with changed sample data:

print (df)



         date  Expected_response

0  2018-03-01              0.270

1  2018-03-01              0.260

2  2018-03-02              0.240

3  2018-03-02              0.224

4  2018-03-02              0.204

5  2018-03-01              0.194

6  2018-03-01              0.190



s = (df['date']!=df['date'].shift()).cumsum()

print (s)

0    1

1    1

2    2

3    2

4    2

5    3

6    3

Name: date, dtype: int32



df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()

print (df)

        date  Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-02              0.240                0.240

3 2018-03-02              0.224                0.464

4 2018-03-02              0.204                0.668

5 2018-03-01              0.194                0.194

6 2018-03-01              0.190                0.384

edited Nov 28 '18 at 9:53

answered Nov 28 '18 at 9:36

jezrael

349k26310384

edited Nov 28 '18 at 9:53

answered Nov 28 '18 at 9:36

jezrael

349k26310384

answered Nov 28 '18 at 9:36

jezrael

349k26310384

answered Nov 28 '18 at 9:36

jezrael

349k26310384

add a comment |

You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df to see how works:

print(df)



     date          Expected_response  

0 2018-03-01              0.270                

1 2018-03-01              0.260                

2 2018-03-01              0.240                

3 2018-03-01              0.224               

4 2018-03-02              0.204                 

5 2018-03-02              0.194                

6 2018-03-02              0.190                



df['change'] = df.date.diff().abs().fillna(0).cumsum()

print(df)



    date    Expected_response   change

0 2018-03-01              0.270 0 days

1 2018-03-01              0.260 0 days

2 2018-03-01              0.240 0 days

3 2018-03-01              0.224 0 days

4 2018-03-02              0.204 1 days

5 2018-03-02              0.194 1 days

6 2018-03-02              0.190 1 days



df['cumulative_expected'] = df.groupby('change').cumsum()

print(df.drop(['change'], axis = 1))



    date          Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-01              0.240                0.770

3 2018-03-01              0.224                0.994

4 2018-03-02              0.204                0.204

5 2018-03-02              0.194                0.398

6 2018-03-02              0.190                0.588

edited Nov 28 '18 at 10:23

answered Nov 28 '18 at 9:45

yatu

13.8k31441

add a comment |

You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df to see how works:

print(df)



     date          Expected_response  

0 2018-03-01              0.270                

1 2018-03-01              0.260                

2 2018-03-01              0.240                

3 2018-03-01              0.224               

4 2018-03-02              0.204                 

5 2018-03-02              0.194                

6 2018-03-02              0.190                



df['change'] = df.date.diff().abs().fillna(0).cumsum()

print(df)



    date    Expected_response   change

0 2018-03-01              0.270 0 days

1 2018-03-01              0.260 0 days

2 2018-03-01              0.240 0 days

3 2018-03-01              0.224 0 days

4 2018-03-02              0.204 1 days

5 2018-03-02              0.194 1 days

6 2018-03-02              0.190 1 days



df['cumulative_expected'] = df.groupby('change').cumsum()

print(df.drop(['change'], axis = 1))



    date          Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-01              0.240                0.770

3 2018-03-01              0.224                0.994

4 2018-03-02              0.204                0.204

5 2018-03-02              0.194                0.398

6 2018-03-02              0.190                0.588

edited Nov 28 '18 at 10:23

answered Nov 28 '18 at 9:45

yatu

13.8k31441

add a comment |

You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df to see how works:

print(df)



     date          Expected_response  

0 2018-03-01              0.270                

1 2018-03-01              0.260                

2 2018-03-01              0.240                

3 2018-03-01              0.224               

4 2018-03-02              0.204                 

5 2018-03-02              0.194                

6 2018-03-02              0.190                



df['change'] = df.date.diff().abs().fillna(0).cumsum()

print(df)



    date    Expected_response   change

0 2018-03-01              0.270 0 days

1 2018-03-01              0.260 0 days

2 2018-03-01              0.240 0 days

3 2018-03-01              0.224 0 days

4 2018-03-02              0.204 1 days

5 2018-03-02              0.194 1 days

6 2018-03-02              0.190 1 days



df['cumulative_expected'] = df.groupby('change').cumsum()

print(df.drop(['change'], axis = 1))



    date          Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-01              0.240                0.770

3 2018-03-01              0.224                0.994

4 2018-03-02              0.204                0.204

5 2018-03-02              0.194                0.398

6 2018-03-02              0.190                0.588

edited Nov 28 '18 at 10:23

answered Nov 28 '18 at 9:45

yatu

13.8k31441

You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
Here I use a slightly modified df to see how works:

print(df)



     date          Expected_response  

0 2018-03-01              0.270                

1 2018-03-01              0.260                

2 2018-03-01              0.240                

3 2018-03-01              0.224               

4 2018-03-02              0.204                 

5 2018-03-02              0.194                

6 2018-03-02              0.190                



df['change'] = df.date.diff().abs().fillna(0).cumsum()

print(df)



    date    Expected_response   change

0 2018-03-01              0.270 0 days

1 2018-03-01              0.260 0 days

2 2018-03-01              0.240 0 days

3 2018-03-01              0.224 0 days

4 2018-03-02              0.204 1 days

5 2018-03-02              0.194 1 days

6 2018-03-02              0.190 1 days



df['cumulative_expected'] = df.groupby('change').cumsum()

print(df.drop(['change'], axis = 1))



    date          Expected_response  cumulative_expected

0 2018-03-01              0.270                0.270

1 2018-03-01              0.260                0.530

2 2018-03-01              0.240                0.770

3 2018-03-01              0.224                0.994

4 2018-03-02              0.204                0.204

5 2018-03-02              0.194                0.398

6 2018-03-02              0.190                0.588

edited Nov 28 '18 at 10:23

answered Nov 28 '18 at 9:45

yatu

13.8k31441

edited Nov 28 '18 at 10:23

answered Nov 28 '18 at 9:45

yatu

13.8k31441

answered Nov 28 '18 at 9:45

yatu

13.8k31441

answered Nov 28 '18 at 9:45

yatu

13.8k31441

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl