Python dataframe group by following values

I am looking for the best pythonic way to group the sum of values that follow each other from hour to hour. Example: I have the following Dataframe with the columns 'date' and'rainfall' and I have added an example of the desired result :



        date        rainfall    RE

    31/12/17 23:00  0.88    None

    01/01/18 00:00  0.38    1.26

    01/01/18 01:00  0       None

    01/01/18 02:00  0.22    0.22

    01/01/18 03:00  0       None

    01/01/18 04:00  0       None

    01/01/18 13:00  0       None

    01/01/18 14:00  0       None

    01/01/18 15:00  0.55    0.55

    01/01/18 16:00  0       None

    01/01/18 17:00  1.31    1.31

    01/01/18 18:00  0       None

    01/01/18 19:00  0.49    0.49

    01/01/18 20:00  0       None

    01/01/18 21:00  0       None

    01/01/18 22:00  0       None

    01/01/18 23:00  0       None

    02/01/18 00:00  0.7     None

    02/01/18 01:00  0.22    None

    02/01/18 02:00  0.61    None

    02/01/18 03:00  0.42    1.95

    02/01/18 04:00  0       None

    02/01/18 05:00  1.69    1.69

    02/01/18 06:00  0       None

    02/01/18 07:00  0       None

    02/01/18 08:00  0       None

I hope to be clear,

thanks a lot for your help,

Rémy

asked Nov 26 '18 at 8:32

R. Claverie

Think he's meaning a 'rolling_sum'

– Eran Moshe
Nov 26 '18 at 8:49

add a comment |



        date        rainfall    RE

    31/12/17 23:00  0.88    None

    01/01/18 00:00  0.38    1.26

    01/01/18 01:00  0       None

    01/01/18 02:00  0.22    0.22

    01/01/18 03:00  0       None

    01/01/18 04:00  0       None

    01/01/18 13:00  0       None

    01/01/18 14:00  0       None

    01/01/18 15:00  0.55    0.55

    01/01/18 16:00  0       None

    01/01/18 17:00  1.31    1.31

    01/01/18 18:00  0       None

    01/01/18 19:00  0.49    0.49

    01/01/18 20:00  0       None

    01/01/18 21:00  0       None

    01/01/18 22:00  0       None

    01/01/18 23:00  0       None

    02/01/18 00:00  0.7     None

    02/01/18 01:00  0.22    None

    02/01/18 02:00  0.61    None

    02/01/18 03:00  0.42    1.95

    02/01/18 04:00  0       None

    02/01/18 05:00  1.69    1.69

    02/01/18 06:00  0       None

    02/01/18 07:00  0       None

    02/01/18 08:00  0       None

I hope to be clear,

thanks a lot for your help,

Rémy

asked Nov 26 '18 at 8:32

R. Claverie

Think he's meaning a 'rolling_sum'

– Eran Moshe
Nov 26 '18 at 8:49

add a comment |



        date        rainfall    RE

    31/12/17 23:00  0.88    None

    01/01/18 00:00  0.38    1.26

    01/01/18 01:00  0       None

    01/01/18 02:00  0.22    0.22

    01/01/18 03:00  0       None

    01/01/18 04:00  0       None

    01/01/18 13:00  0       None

    01/01/18 14:00  0       None

    01/01/18 15:00  0.55    0.55

    01/01/18 16:00  0       None

    01/01/18 17:00  1.31    1.31

    01/01/18 18:00  0       None

    01/01/18 19:00  0.49    0.49

    01/01/18 20:00  0       None

    01/01/18 21:00  0       None

    01/01/18 22:00  0       None

    01/01/18 23:00  0       None

    02/01/18 00:00  0.7     None

    02/01/18 01:00  0.22    None

    02/01/18 02:00  0.61    None

    02/01/18 03:00  0.42    1.95

    02/01/18 04:00  0       None

    02/01/18 05:00  1.69    1.69

    02/01/18 06:00  0       None

    02/01/18 07:00  0       None

    02/01/18 08:00  0       None

I hope to be clear,

thanks a lot for your help,

Rémy

asked Nov 26 '18 at 8:32

R. Claverie



        date        rainfall    RE

    31/12/17 23:00  0.88    None

    01/01/18 00:00  0.38    1.26

    01/01/18 01:00  0       None

    01/01/18 02:00  0.22    0.22

    01/01/18 03:00  0       None

    01/01/18 04:00  0       None

    01/01/18 13:00  0       None

    01/01/18 14:00  0       None

    01/01/18 15:00  0.55    0.55

    01/01/18 16:00  0       None

    01/01/18 17:00  1.31    1.31

    01/01/18 18:00  0       None

    01/01/18 19:00  0.49    0.49

    01/01/18 20:00  0       None

    01/01/18 21:00  0       None

    01/01/18 22:00  0       None

    01/01/18 23:00  0       None

    02/01/18 00:00  0.7     None

    02/01/18 01:00  0.22    None

    02/01/18 02:00  0.61    None

    02/01/18 03:00  0.42    1.95

    02/01/18 04:00  0       None

    02/01/18 05:00  1.69    1.69

    02/01/18 06:00  0       None

    02/01/18 07:00  0       None

    02/01/18 08:00  0       None

I hope to be clear,

thanks a lot for your help,

Rémy

python datetime dataframe pandas-groupby

asked Nov 26 '18 at 8:32

R. Claverie

asked Nov 26 '18 at 8:32

R. Claverie

asked Nov 26 '18 at 8:32

R. Claverie

asked Nov 26 '18 at 8:32

R. Claverie

asked Nov 26 '18 at 8:32

R. Claverie

Think he's meaning a 'rolling_sum'

– Eran Moshe
Nov 26 '18 at 8:49

add a comment |

Think he's meaning a 'rolling_sum'

– Eran Moshe
Nov 26 '18 at 8:49

Think he's meaning a 'rolling_sum'

– Eran Moshe
Nov 26 '18 at 8:49

add a comment |

2 Answers
2

active

oldest

votes

If I understand correctly, what you want is a rolling sum.
The following will output a Pandas Series of the rolling sums of 'rainfall' over 2 periods

df['rainfall'].rolling(2).sum()

Ofcourse, your DataFrame must stay sorted as you show in your example.

adding it as a part of your DataFrame would be:

df['rainfall_rolling_sum'] = df['rainfall'].rolling(2).sum()

EDIT1:

If you mean you just want to accumulate the rainfall use:

df['rainfall_cumsum'] = df['rainfall'].cumsum()

as the function cumsum() is accumulative sum

EDIT2:

rf_not_zero = df['rainfall'] != 0

df['rainfall_accum'] = df['rainfall'].cumsum()-df['rainfall'].cumsum().where(~df_not_zero).ffill().fillna(0).astype(int)

edited Nov 26 '18 at 10:51

answered Nov 26 '18 at 8:54

Eran Moshe

1,371722

Thanks a lot Eran for this quick answer, but it does not work as expected. If you look at from the date "02/01/18 00:00", there is 4 values to add and not only 2. Thanks again, Rémy

– R. Claverie
Nov 26 '18 at 9:11

Do you mean just accumulate the rainfall ?

– Eran Moshe
Nov 26 '18 at 9:20

Yes, but only when I have a rainfall event over several consecutive hours. Thanks. Rémy

– R. Claverie
Nov 26 '18 at 10:24

@R.Claverie Only took the whole day to understand :>

– Eran Moshe
Nov 26 '18 at 10:51

add a comment |

Okey it is not the best, but when you have big dataframes this could be a simple approach. (If it rains at the beginning or at the end, this will be ignored in this example. See Warning)

import pandas as pd

import numpy as np



# generate pseudo dataframe

rng = pd.date_range('1/1/2012', periods=20, freq='H')

rain = np.random.choice([0,0.5,1,2], size=20, p=[0.4,0.2,0.2,0.2])



df = pd.DataFrame()

df['data'] = rng

df['rain'] = rain



# convert rain to boolean

df['is_rain'] = df['rain'] > 0



# shift rain for one slot to recognice if the state has changed

data = list(df['is_rain'].values[0:-1])

shifted = [data[0]]

shifted.extend(data)

df['is_rain_next'] = shifted



# get start and endpoints of rain (the end has to be excluded! 

# it is always the next period, when it isn't raining anymore)

df['rain_start'] = df['is_rain'] < df['is_rain_next']

df['rain_end'] = df['is_rain'] > df['is_rain_next']



# this are the starts and ends you can use them for getting the groups from dataframe

df[df['rain_start']].index

df[df['rain_end']].index

Warning: The arrays can have different sizes, thats caused by the unknown state at the end and at the beginning. So check if the first element in rain_end smaller than the first element in rain_start than you can delete it. Also if the last element of rain_start is greater than the last element of rain_end (sound logical i think)

edited Nov 26 '18 at 11:38

answered Nov 26 '18 at 11:13

MisterMonk

1549

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53477230%2fpython-dataframe-group-by-following-values%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

If I understand correctly, what you want is a rolling sum.
The following will output a Pandas Series of the rolling sums of 'rainfall' over 2 periods

df['rainfall'].rolling(2).sum()

Ofcourse, your DataFrame must stay sorted as you show in your example.

adding it as a part of your DataFrame would be:

df['rainfall_rolling_sum'] = df['rainfall'].rolling(2).sum()

EDIT1:

If you mean you just want to accumulate the rainfall use:

df['rainfall_cumsum'] = df['rainfall'].cumsum()

as the function cumsum() is accumulative sum

EDIT2:

rf_not_zero = df['rainfall'] != 0

df['rainfall_accum'] = df['rainfall'].cumsum()-df['rainfall'].cumsum().where(~df_not_zero).ffill().fillna(0).astype(int)

edited Nov 26 '18 at 10:51

answered Nov 26 '18 at 8:54

Eran Moshe

1,371722

Thanks a lot Eran for this quick answer, but it does not work as expected. If you look at from the date "02/01/18 00:00", there is 4 values to add and not only 2. Thanks again, Rémy

– R. Claverie
Nov 26 '18 at 9:11

Do you mean just accumulate the rainfall ?

– Eran Moshe
Nov 26 '18 at 9:20

Yes, but only when I have a rainfall event over several consecutive hours. Thanks. Rémy

– R. Claverie
Nov 26 '18 at 10:24

@R.Claverie Only took the whole day to understand :>

– Eran Moshe
Nov 26 '18 at 10:51

add a comment |

If I understand correctly, what you want is a rolling sum.
The following will output a Pandas Series of the rolling sums of 'rainfall' over 2 periods

df['rainfall'].rolling(2).sum()

Ofcourse, your DataFrame must stay sorted as you show in your example.

adding it as a part of your DataFrame would be:

df['rainfall_rolling_sum'] = df['rainfall'].rolling(2).sum()

EDIT1:

If you mean you just want to accumulate the rainfall use:

df['rainfall_cumsum'] = df['rainfall'].cumsum()

as the function cumsum() is accumulative sum

EDIT2:

rf_not_zero = df['rainfall'] != 0

df['rainfall_accum'] = df['rainfall'].cumsum()-df['rainfall'].cumsum().where(~df_not_zero).ffill().fillna(0).astype(int)

edited Nov 26 '18 at 10:51

answered Nov 26 '18 at 8:54

Eran Moshe

1,371722

Thanks a lot Eran for this quick answer, but it does not work as expected. If you look at from the date "02/01/18 00:00", there is 4 values to add and not only 2. Thanks again, Rémy

– R. Claverie
Nov 26 '18 at 9:11

Do you mean just accumulate the rainfall ?

– Eran Moshe
Nov 26 '18 at 9:20

Yes, but only when I have a rainfall event over several consecutive hours. Thanks. Rémy

– R. Claverie
Nov 26 '18 at 10:24

@R.Claverie Only took the whole day to understand :>

– Eran Moshe
Nov 26 '18 at 10:51

add a comment |

If I understand correctly, what you want is a rolling sum.
The following will output a Pandas Series of the rolling sums of 'rainfall' over 2 periods

df['rainfall'].rolling(2).sum()

Ofcourse, your DataFrame must stay sorted as you show in your example.

adding it as a part of your DataFrame would be:

df['rainfall_rolling_sum'] = df['rainfall'].rolling(2).sum()

EDIT1:

If you mean you just want to accumulate the rainfall use:

df['rainfall_cumsum'] = df['rainfall'].cumsum()

as the function cumsum() is accumulative sum

EDIT2:

rf_not_zero = df['rainfall'] != 0

df['rainfall_accum'] = df['rainfall'].cumsum()-df['rainfall'].cumsum().where(~df_not_zero).ffill().fillna(0).astype(int)

edited Nov 26 '18 at 10:51

answered Nov 26 '18 at 8:54

Eran Moshe

1,371722

If I understand correctly, what you want is a rolling sum.
The following will output a Pandas Series of the rolling sums of 'rainfall' over 2 periods

df['rainfall'].rolling(2).sum()

Ofcourse, your DataFrame must stay sorted as you show in your example.

adding it as a part of your DataFrame would be:

df['rainfall_rolling_sum'] = df['rainfall'].rolling(2).sum()

EDIT1:

If you mean you just want to accumulate the rainfall use:

df['rainfall_cumsum'] = df['rainfall'].cumsum()

as the function cumsum() is accumulative sum

EDIT2:

rf_not_zero = df['rainfall'] != 0

df['rainfall_accum'] = df['rainfall'].cumsum()-df['rainfall'].cumsum().where(~df_not_zero).ffill().fillna(0).astype(int)

edited Nov 26 '18 at 10:51

answered Nov 26 '18 at 8:54

Eran Moshe

1,371722

edited Nov 26 '18 at 10:51

answered Nov 26 '18 at 8:54

Eran Moshe

1,371722

answered Nov 26 '18 at 8:54

Eran Moshe

1,371722

answered Nov 26 '18 at 8:54

Eran Moshe

1,371722

Thanks a lot Eran for this quick answer, but it does not work as expected. If you look at from the date "02/01/18 00:00", there is 4 values to add and not only 2. Thanks again, Rémy

– R. Claverie
Nov 26 '18 at 9:11

Do you mean just accumulate the rainfall ?

– Eran Moshe
Nov 26 '18 at 9:20

Yes, but only when I have a rainfall event over several consecutive hours. Thanks. Rémy

– R. Claverie
Nov 26 '18 at 10:24

@R.Claverie Only took the whole day to understand :>

– Eran Moshe
Nov 26 '18 at 10:51

add a comment |

Thanks a lot Eran for this quick answer, but it does not work as expected. If you look at from the date "02/01/18 00:00", there is 4 values to add and not only 2. Thanks again, Rémy

– R. Claverie
Nov 26 '18 at 9:11

Do you mean just accumulate the rainfall ?

– Eran Moshe
Nov 26 '18 at 9:20

Yes, but only when I have a rainfall event over several consecutive hours. Thanks. Rémy

– R. Claverie
Nov 26 '18 at 10:24

@R.Claverie Only took the whole day to understand :>

– Eran Moshe
Nov 26 '18 at 10:51

Thanks a lot Eran for this quick answer, but it does not work as expected. If you look at from the date "02/01/18 00:00", there is 4 values to add and not only 2. Thanks again, Rémy

– R. Claverie
Nov 26 '18 at 9:11

Do you mean just accumulate the rainfall ?

– Eran Moshe
Nov 26 '18 at 9:20

Yes, but only when I have a rainfall event over several consecutive hours. Thanks. Rémy

– R. Claverie
Nov 26 '18 at 10:24

@R.Claverie Only took the whole day to understand :>

– Eran Moshe
Nov 26 '18 at 10:51

add a comment |

Okey it is not the best, but when you have big dataframes this could be a simple approach. (If it rains at the beginning or at the end, this will be ignored in this example. See Warning)

import pandas as pd

import numpy as np



# generate pseudo dataframe

rng = pd.date_range('1/1/2012', periods=20, freq='H')

rain = np.random.choice([0,0.5,1,2], size=20, p=[0.4,0.2,0.2,0.2])



df = pd.DataFrame()

df['data'] = rng

df['rain'] = rain



# convert rain to boolean

df['is_rain'] = df['rain'] > 0



# shift rain for one slot to recognice if the state has changed

data = list(df['is_rain'].values[0:-1])

shifted = [data[0]]

shifted.extend(data)

df['is_rain_next'] = shifted



# get start and endpoints of rain (the end has to be excluded! 

# it is always the next period, when it isn't raining anymore)

df['rain_start'] = df['is_rain'] < df['is_rain_next']

df['rain_end'] = df['is_rain'] > df['is_rain_next']



# this are the starts and ends you can use them for getting the groups from dataframe

df[df['rain_start']].index

df[df['rain_end']].index

edited Nov 26 '18 at 11:38

answered Nov 26 '18 at 11:13

MisterMonk

1549

add a comment |

Okey it is not the best, but when you have big dataframes this could be a simple approach. (If it rains at the beginning or at the end, this will be ignored in this example. See Warning)

import pandas as pd

import numpy as np



# generate pseudo dataframe

rng = pd.date_range('1/1/2012', periods=20, freq='H')

rain = np.random.choice([0,0.5,1,2], size=20, p=[0.4,0.2,0.2,0.2])



df = pd.DataFrame()

df['data'] = rng

df['rain'] = rain



# convert rain to boolean

df['is_rain'] = df['rain'] > 0



# shift rain for one slot to recognice if the state has changed

data = list(df['is_rain'].values[0:-1])

shifted = [data[0]]

shifted.extend(data)

df['is_rain_next'] = shifted



# get start and endpoints of rain (the end has to be excluded! 

# it is always the next period, when it isn't raining anymore)

df['rain_start'] = df['is_rain'] < df['is_rain_next']

df['rain_end'] = df['is_rain'] > df['is_rain_next']



# this are the starts and ends you can use them for getting the groups from dataframe

df[df['rain_start']].index

df[df['rain_end']].index

edited Nov 26 '18 at 11:38

answered Nov 26 '18 at 11:13

MisterMonk

1549

add a comment |

Okey it is not the best, but when you have big dataframes this could be a simple approach. (If it rains at the beginning or at the end, this will be ignored in this example. See Warning)

import pandas as pd

import numpy as np



# generate pseudo dataframe

rng = pd.date_range('1/1/2012', periods=20, freq='H')

rain = np.random.choice([0,0.5,1,2], size=20, p=[0.4,0.2,0.2,0.2])



df = pd.DataFrame()

df['data'] = rng

df['rain'] = rain



# convert rain to boolean

df['is_rain'] = df['rain'] > 0



# shift rain for one slot to recognice if the state has changed

data = list(df['is_rain'].values[0:-1])

shifted = [data[0]]

shifted.extend(data)

df['is_rain_next'] = shifted



# get start and endpoints of rain (the end has to be excluded! 

# it is always the next period, when it isn't raining anymore)

df['rain_start'] = df['is_rain'] < df['is_rain_next']

df['rain_end'] = df['is_rain'] > df['is_rain_next']



# this are the starts and ends you can use them for getting the groups from dataframe

df[df['rain_start']].index

df[df['rain_end']].index

edited Nov 26 '18 at 11:38

answered Nov 26 '18 at 11:13

MisterMonk

1549

Okey it is not the best, but when you have big dataframes this could be a simple approach. (If it rains at the beginning or at the end, this will be ignored in this example. See Warning)

import pandas as pd

import numpy as np



# generate pseudo dataframe

rng = pd.date_range('1/1/2012', periods=20, freq='H')

rain = np.random.choice([0,0.5,1,2], size=20, p=[0.4,0.2,0.2,0.2])



df = pd.DataFrame()

df['data'] = rng

df['rain'] = rain



# convert rain to boolean

df['is_rain'] = df['rain'] > 0



# shift rain for one slot to recognice if the state has changed

data = list(df['is_rain'].values[0:-1])

shifted = [data[0]]

shifted.extend(data)

df['is_rain_next'] = shifted



# get start and endpoints of rain (the end has to be excluded! 

# it is always the next period, when it isn't raining anymore)

df['rain_start'] = df['is_rain'] < df['is_rain_next']

df['rain_end'] = df['is_rain'] > df['is_rain_next']



# this are the starts and ends you can use them for getting the groups from dataframe

df[df['rain_start']].index

df[df['rain_end']].index

edited Nov 26 '18 at 11:38

answered Nov 26 '18 at 11:13

MisterMonk

1549

edited Nov 26 '18 at 11:38

answered Nov 26 '18 at 11:13

MisterMonk

1549

answered Nov 26 '18 at 11:13

MisterMonk

1549

answered Nov 26 '18 at 11:13

MisterMonk

1549

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl