Pandas: cumulative sum every n rows












3















I have a dataframe with a column "date" of type dtype M8[ns] and another "expected_response". Then, there is a column "cumulative_expected" which does the cumulative sum of the expected_response among the rows with the same date. The dataframe has a row for each second of the month. Like below:



               date Expected_response cumulative_expected
0 2018-03-01 0.270 0.270
1 2018-03-01 0.260 0.530
2 2018-03-01 0.240 0.770
3 2018-03-01 0.224 0.994
4 2018-03-01 0.204 1.198
5 2018-03-01 0.194 1.392
6 2018-03-01 0.190 1.582
... ... ... ...
2678395 2018-03-31 0.164 -7533.464
2678396 2018-03-31 0.164 -7533.300
2678397 2018-03-31 0.160 -7533.140
2678398 2018-03-31 0.154 -7532.986
2678399 2018-03-31 0.150 -7532.836


as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.



The code is:



df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()



Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.



Is there any other solution?
thank you in advance










share|improve this question



























    3















    I have a dataframe with a column "date" of type dtype M8[ns] and another "expected_response". Then, there is a column "cumulative_expected" which does the cumulative sum of the expected_response among the rows with the same date. The dataframe has a row for each second of the month. Like below:



                   date Expected_response cumulative_expected
    0 2018-03-01 0.270 0.270
    1 2018-03-01 0.260 0.530
    2 2018-03-01 0.240 0.770
    3 2018-03-01 0.224 0.994
    4 2018-03-01 0.204 1.198
    5 2018-03-01 0.194 1.392
    6 2018-03-01 0.190 1.582
    ... ... ... ...
    2678395 2018-03-31 0.164 -7533.464
    2678396 2018-03-31 0.164 -7533.300
    2678397 2018-03-31 0.160 -7533.140
    2678398 2018-03-31 0.154 -7532.986
    2678399 2018-03-31 0.150 -7532.836


    as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.



    The code is:



    df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()



    Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.



    Is there any other solution?
    thank you in advance










    share|improve this question

























      3












      3








      3








      I have a dataframe with a column "date" of type dtype M8[ns] and another "expected_response". Then, there is a column "cumulative_expected" which does the cumulative sum of the expected_response among the rows with the same date. The dataframe has a row for each second of the month. Like below:



                     date Expected_response cumulative_expected
      0 2018-03-01 0.270 0.270
      1 2018-03-01 0.260 0.530
      2 2018-03-01 0.240 0.770
      3 2018-03-01 0.224 0.994
      4 2018-03-01 0.204 1.198
      5 2018-03-01 0.194 1.392
      6 2018-03-01 0.190 1.582
      ... ... ... ...
      2678395 2018-03-31 0.164 -7533.464
      2678396 2018-03-31 0.164 -7533.300
      2678397 2018-03-31 0.160 -7533.140
      2678398 2018-03-31 0.154 -7532.986
      2678399 2018-03-31 0.150 -7532.836


      as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.



      The code is:



      df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()



      Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.



      Is there any other solution?
      thank you in advance










      share|improve this question














      I have a dataframe with a column "date" of type dtype M8[ns] and another "expected_response". Then, there is a column "cumulative_expected" which does the cumulative sum of the expected_response among the rows with the same date. The dataframe has a row for each second of the month. Like below:



                     date Expected_response cumulative_expected
      0 2018-03-01 0.270 0.270
      1 2018-03-01 0.260 0.530
      2 2018-03-01 0.240 0.770
      3 2018-03-01 0.224 0.994
      4 2018-03-01 0.204 1.198
      5 2018-03-01 0.194 1.392
      6 2018-03-01 0.190 1.582
      ... ... ... ...
      2678395 2018-03-31 0.164 -7533.464
      2678396 2018-03-31 0.164 -7533.300
      2678397 2018-03-31 0.160 -7533.140
      2678398 2018-03-31 0.154 -7532.986
      2678399 2018-03-31 0.150 -7532.836


      as you can see there is an error: the cumulative sum does not recognise the Change of the date and the cumulative sum does not restart each time the date changes.



      The code is:



      df['cumulative_expected']=df.groupby(df['date']!=df['date'])['Expected_response'].cumsum()



      Maybe an Option could be to create a Counter that increases by 1 each 86400 rows (seconds in a day) and then groupby the Counter. But I don't know how to do it.



      Is there any other solution?
      thank you in advance







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 28 '18 at 9:34









      Luca91Luca91

      1858




      1858
























          2 Answers
          2






          active

          oldest

          votes


















          3














          There is default index, so you can use floor division:



          df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()


          Generally solution is create np.arange with floor division:



          arr = np.arange(len(df)) // 86400
          df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()


          Your solution should be changed with comparing shifted values with cumsum:



          s = (df['date']!=df['date'].shift()).cumsum()
          df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()


          Test with changed sample data:



          print (df)

          date Expected_response
          0 2018-03-01 0.270
          1 2018-03-01 0.260
          2 2018-03-02 0.240
          3 2018-03-02 0.224
          4 2018-03-02 0.204
          5 2018-03-01 0.194
          6 2018-03-01 0.190

          s = (df['date']!=df['date'].shift()).cumsum()
          print (s)
          0 1
          1 1
          2 2
          3 2
          4 2
          5 3
          6 3
          Name: date, dtype: int32

          df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
          print (df)
          date Expected_response cumulative_expected
          0 2018-03-01 0.270 0.270
          1 2018-03-01 0.260 0.530
          2 2018-03-02 0.240 0.240
          3 2018-03-02 0.224 0.464
          4 2018-03-02 0.204 0.668
          5 2018-03-01 0.194 0.194
          6 2018-03-01 0.190 0.384





          share|improve this answer

































            2














            You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
            Here I use a slightly modified df to see how works:



            print(df)

            date Expected_response
            0 2018-03-01 0.270
            1 2018-03-01 0.260
            2 2018-03-01 0.240
            3 2018-03-01 0.224
            4 2018-03-02 0.204
            5 2018-03-02 0.194
            6 2018-03-02 0.190

            df['change'] = df.date.diff().abs().fillna(0).cumsum()
            print(df)

            date Expected_response change
            0 2018-03-01 0.270 0 days
            1 2018-03-01 0.260 0 days
            2 2018-03-01 0.240 0 days
            3 2018-03-01 0.224 0 days
            4 2018-03-02 0.204 1 days
            5 2018-03-02 0.194 1 days
            6 2018-03-02 0.190 1 days

            df['cumulative_expected'] = df.groupby('change').cumsum()
            print(df.drop(['change'], axis = 1))

            date Expected_response cumulative_expected
            0 2018-03-01 0.270 0.270
            1 2018-03-01 0.260 0.530
            2 2018-03-01 0.240 0.770
            3 2018-03-01 0.224 0.994
            4 2018-03-02 0.204 0.204
            5 2018-03-02 0.194 0.398
            6 2018-03-02 0.190 0.588





            share|improve this answer

























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53516252%2fpandas-cumulative-sum-every-n-rows%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              There is default index, so you can use floor division:



              df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()


              Generally solution is create np.arange with floor division:



              arr = np.arange(len(df)) // 86400
              df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()


              Your solution should be changed with comparing shifted values with cumsum:



              s = (df['date']!=df['date'].shift()).cumsum()
              df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()


              Test with changed sample data:



              print (df)

              date Expected_response
              0 2018-03-01 0.270
              1 2018-03-01 0.260
              2 2018-03-02 0.240
              3 2018-03-02 0.224
              4 2018-03-02 0.204
              5 2018-03-01 0.194
              6 2018-03-01 0.190

              s = (df['date']!=df['date'].shift()).cumsum()
              print (s)
              0 1
              1 1
              2 2
              3 2
              4 2
              5 3
              6 3
              Name: date, dtype: int32

              df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
              print (df)
              date Expected_response cumulative_expected
              0 2018-03-01 0.270 0.270
              1 2018-03-01 0.260 0.530
              2 2018-03-02 0.240 0.240
              3 2018-03-02 0.224 0.464
              4 2018-03-02 0.204 0.668
              5 2018-03-01 0.194 0.194
              6 2018-03-01 0.190 0.384





              share|improve this answer






























                3














                There is default index, so you can use floor division:



                df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()


                Generally solution is create np.arange with floor division:



                arr = np.arange(len(df)) // 86400
                df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()


                Your solution should be changed with comparing shifted values with cumsum:



                s = (df['date']!=df['date'].shift()).cumsum()
                df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()


                Test with changed sample data:



                print (df)

                date Expected_response
                0 2018-03-01 0.270
                1 2018-03-01 0.260
                2 2018-03-02 0.240
                3 2018-03-02 0.224
                4 2018-03-02 0.204
                5 2018-03-01 0.194
                6 2018-03-01 0.190

                s = (df['date']!=df['date'].shift()).cumsum()
                print (s)
                0 1
                1 1
                2 2
                3 2
                4 2
                5 3
                6 3
                Name: date, dtype: int32

                df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
                print (df)
                date Expected_response cumulative_expected
                0 2018-03-01 0.270 0.270
                1 2018-03-01 0.260 0.530
                2 2018-03-02 0.240 0.240
                3 2018-03-02 0.224 0.464
                4 2018-03-02 0.204 0.668
                5 2018-03-01 0.194 0.194
                6 2018-03-01 0.190 0.384





                share|improve this answer




























                  3












                  3








                  3







                  There is default index, so you can use floor division:



                  df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()


                  Generally solution is create np.arange with floor division:



                  arr = np.arange(len(df)) // 86400
                  df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()


                  Your solution should be changed with comparing shifted values with cumsum:



                  s = (df['date']!=df['date'].shift()).cumsum()
                  df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()


                  Test with changed sample data:



                  print (df)

                  date Expected_response
                  0 2018-03-01 0.270
                  1 2018-03-01 0.260
                  2 2018-03-02 0.240
                  3 2018-03-02 0.224
                  4 2018-03-02 0.204
                  5 2018-03-01 0.194
                  6 2018-03-01 0.190

                  s = (df['date']!=df['date'].shift()).cumsum()
                  print (s)
                  0 1
                  1 1
                  2 2
                  3 2
                  4 2
                  5 3
                  6 3
                  Name: date, dtype: int32

                  df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
                  print (df)
                  date Expected_response cumulative_expected
                  0 2018-03-01 0.270 0.270
                  1 2018-03-01 0.260 0.530
                  2 2018-03-02 0.240 0.240
                  3 2018-03-02 0.224 0.464
                  4 2018-03-02 0.204 0.668
                  5 2018-03-01 0.194 0.194
                  6 2018-03-01 0.190 0.384





                  share|improve this answer















                  There is default index, so you can use floor division:



                  df['cumulative_expected'] = df['Expected_response'].groupby(df.index // 86400).cumsum()


                  Generally solution is create np.arange with floor division:



                  arr = np.arange(len(df)) // 86400
                  df['cumulative_expected'] = df['Expected_response'].groupby(arr).cumsum()


                  Your solution should be changed with comparing shifted values with cumsum:



                  s = (df['date']!=df['date'].shift()).cumsum()
                  df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()


                  Test with changed sample data:



                  print (df)

                  date Expected_response
                  0 2018-03-01 0.270
                  1 2018-03-01 0.260
                  2 2018-03-02 0.240
                  3 2018-03-02 0.224
                  4 2018-03-02 0.204
                  5 2018-03-01 0.194
                  6 2018-03-01 0.190

                  s = (df['date']!=df['date'].shift()).cumsum()
                  print (s)
                  0 1
                  1 1
                  2 2
                  3 2
                  4 2
                  5 3
                  6 3
                  Name: date, dtype: int32

                  df['cumulative_expected'] = df['Expected_response'].groupby(s).cumsum()
                  print (df)
                  date Expected_response cumulative_expected
                  0 2018-03-01 0.270 0.270
                  1 2018-03-01 0.260 0.530
                  2 2018-03-02 0.240 0.240
                  3 2018-03-02 0.224 0.464
                  4 2018-03-02 0.204 0.668
                  5 2018-03-01 0.194 0.194
                  6 2018-03-01 0.190 0.384






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 28 '18 at 9:53

























                  answered Nov 28 '18 at 9:36









                  jezraeljezrael

                  349k26310384




                  349k26310384

























                      2














                      You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
                      Here I use a slightly modified df to see how works:



                      print(df)

                      date Expected_response
                      0 2018-03-01 0.270
                      1 2018-03-01 0.260
                      2 2018-03-01 0.240
                      3 2018-03-01 0.224
                      4 2018-03-02 0.204
                      5 2018-03-02 0.194
                      6 2018-03-02 0.190

                      df['change'] = df.date.diff().abs().fillna(0).cumsum()
                      print(df)

                      date Expected_response change
                      0 2018-03-01 0.270 0 days
                      1 2018-03-01 0.260 0 days
                      2 2018-03-01 0.240 0 days
                      3 2018-03-01 0.224 0 days
                      4 2018-03-02 0.204 1 days
                      5 2018-03-02 0.194 1 days
                      6 2018-03-02 0.190 1 days

                      df['cumulative_expected'] = df.groupby('change').cumsum()
                      print(df.drop(['change'], axis = 1))

                      date Expected_response cumulative_expected
                      0 2018-03-01 0.270 0.270
                      1 2018-03-01 0.260 0.530
                      2 2018-03-01 0.240 0.770
                      3 2018-03-01 0.224 0.994
                      4 2018-03-02 0.204 0.204
                      5 2018-03-02 0.194 0.398
                      6 2018-03-02 0.190 0.588





                      share|improve this answer






























                        2














                        You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
                        Here I use a slightly modified df to see how works:



                        print(df)

                        date Expected_response
                        0 2018-03-01 0.270
                        1 2018-03-01 0.260
                        2 2018-03-01 0.240
                        3 2018-03-01 0.224
                        4 2018-03-02 0.204
                        5 2018-03-02 0.194
                        6 2018-03-02 0.190

                        df['change'] = df.date.diff().abs().fillna(0).cumsum()
                        print(df)

                        date Expected_response change
                        0 2018-03-01 0.270 0 days
                        1 2018-03-01 0.260 0 days
                        2 2018-03-01 0.240 0 days
                        3 2018-03-01 0.224 0 days
                        4 2018-03-02 0.204 1 days
                        5 2018-03-02 0.194 1 days
                        6 2018-03-02 0.190 1 days

                        df['cumulative_expected'] = df.groupby('change').cumsum()
                        print(df.drop(['change'], axis = 1))

                        date Expected_response cumulative_expected
                        0 2018-03-01 0.270 0.270
                        1 2018-03-01 0.260 0.530
                        2 2018-03-01 0.240 0.770
                        3 2018-03-01 0.224 0.994
                        4 2018-03-02 0.204 0.204
                        5 2018-03-02 0.194 0.398
                        6 2018-03-02 0.190 0.588





                        share|improve this answer




























                          2












                          2








                          2







                          You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
                          Here I use a slightly modified df to see how works:



                          print(df)

                          date Expected_response
                          0 2018-03-01 0.270
                          1 2018-03-01 0.260
                          2 2018-03-01 0.240
                          3 2018-03-01 0.224
                          4 2018-03-02 0.204
                          5 2018-03-02 0.194
                          6 2018-03-02 0.190

                          df['change'] = df.date.diff().abs().fillna(0).cumsum()
                          print(df)

                          date Expected_response change
                          0 2018-03-01 0.270 0 days
                          1 2018-03-01 0.260 0 days
                          2 2018-03-01 0.240 0 days
                          3 2018-03-01 0.224 0 days
                          4 2018-03-02 0.204 1 days
                          5 2018-03-02 0.194 1 days
                          6 2018-03-02 0.190 1 days

                          df['cumulative_expected'] = df.groupby('change').cumsum()
                          print(df.drop(['change'], axis = 1))

                          date Expected_response cumulative_expected
                          0 2018-03-01 0.270 0.270
                          1 2018-03-01 0.260 0.530
                          2 2018-03-01 0.240 0.770
                          3 2018-03-01 0.224 0.994
                          4 2018-03-02 0.204 0.204
                          5 2018-03-02 0.194 0.398
                          6 2018-03-02 0.190 0.588





                          share|improve this answer















                          You can take the first difference of the date using diff to see were the changes occur, and use this as a reference to take the cumulative sum.
                          Here I use a slightly modified df to see how works:



                          print(df)

                          date Expected_response
                          0 2018-03-01 0.270
                          1 2018-03-01 0.260
                          2 2018-03-01 0.240
                          3 2018-03-01 0.224
                          4 2018-03-02 0.204
                          5 2018-03-02 0.194
                          6 2018-03-02 0.190

                          df['change'] = df.date.diff().abs().fillna(0).cumsum()
                          print(df)

                          date Expected_response change
                          0 2018-03-01 0.270 0 days
                          1 2018-03-01 0.260 0 days
                          2 2018-03-01 0.240 0 days
                          3 2018-03-01 0.224 0 days
                          4 2018-03-02 0.204 1 days
                          5 2018-03-02 0.194 1 days
                          6 2018-03-02 0.190 1 days

                          df['cumulative_expected'] = df.groupby('change').cumsum()
                          print(df.drop(['change'], axis = 1))

                          date Expected_response cumulative_expected
                          0 2018-03-01 0.270 0.270
                          1 2018-03-01 0.260 0.530
                          2 2018-03-01 0.240 0.770
                          3 2018-03-01 0.224 0.994
                          4 2018-03-02 0.204 0.204
                          5 2018-03-02 0.194 0.398
                          6 2018-03-02 0.190 0.588






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 28 '18 at 10:23

























                          answered Nov 28 '18 at 9:45









                          yatuyatu

                          13.8k31441




                          13.8k31441






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53516252%2fpandas-cumulative-sum-every-n-rows%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                              Calculate evaluation metrics using cross_val_predict sklearn

                              Insert data from modal to MySQL (multiple modal on website)