Counting amount of people in building over time

I'm struggling in finding a "simple" way to perform this analysis with Pandas:

I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.

       Full Name                 Time Direction

0  Uncle Scrooge  08-10-2018 09:16:52        In

1  Uncle Scrooge  08-10-2018 16:42:40       Out

2    Donald Duck  08-10-2018 15:04:07        In

3    Donald Duck  08-10-2018 15:06:42       Out

4    Donald Duck  08-10-2018 15:15:49        In

5    Donald Duck  08-10-2018 16:07:57       Out

My ideal final result is showing (in a tabular or better graphical way) how the total number of people into the building changes over the time.
So going back to the sample data I provided, I'd like to show that during the day 08-10-2018:

before 09:16:52 there no one into the building

from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)

from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)

from 15:06:42 to 15:15:48 there is 1 person

from 15:15:49 to 16:07:57 there are 2 again

from 16:07:58 to 16:42:40 there is 1 again

from 16:42:41 to the end of the day there are none

I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.

Any help is appreciated

thanks a lot

giorgio

===============
UPDATE:===============

@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.

Infact, as I mentioned, I'm working with data spanning a period of 2 months.
Moreover, for some reason, it seems that not all the people entering the building have been tracked when exiting it.
So with the cumsum() function, I find the total number of people of a day being influenced by the people of the day before and so on,
That shows an unjustifiable high number of people into the building during early and late hours of any days apart form the very first ones.

So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.

Could you help me in putting all together?
Thanks a lot

giorgio

edited Nov 28 '18 at 11:48

asked Nov 27 '18 at 14:35

GioCo

133

Please share the actual dataframe instead of an image

– yatu
Nov 27 '18 at 14:36

1

The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

– ALollz
Nov 27 '18 at 14:50

add a comment |

I'm struggling in finding a "simple" way to perform this analysis with Pandas:

I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.

       Full Name                 Time Direction

0  Uncle Scrooge  08-10-2018 09:16:52        In

1  Uncle Scrooge  08-10-2018 16:42:40       Out

2    Donald Duck  08-10-2018 15:04:07        In

3    Donald Duck  08-10-2018 15:06:42       Out

4    Donald Duck  08-10-2018 15:15:49        In

5    Donald Duck  08-10-2018 16:07:57       Out

before 09:16:52 there no one into the building

from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)

from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)

from 15:06:42 to 15:15:48 there is 1 person

from 15:15:49 to 16:07:57 there are 2 again

from 16:07:58 to 16:42:40 there is 1 again

from 16:42:41 to the end of the day there are none

I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.

Any help is appreciated

thanks a lot

giorgio

===============
UPDATE:===============

@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.

So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.

Could you help me in putting all together?
Thanks a lot

giorgio

edited Nov 28 '18 at 11:48

asked Nov 27 '18 at 14:35

GioCo

133

Please share the actual dataframe instead of an image

– yatu
Nov 27 '18 at 14:36

1

The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

– ALollz
Nov 27 '18 at 14:50

add a comment |

I'm struggling in finding a "simple" way to perform this analysis with Pandas:

I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.

       Full Name                 Time Direction

0  Uncle Scrooge  08-10-2018 09:16:52        In

1  Uncle Scrooge  08-10-2018 16:42:40       Out

2    Donald Duck  08-10-2018 15:04:07        In

3    Donald Duck  08-10-2018 15:06:42       Out

4    Donald Duck  08-10-2018 15:15:49        In

5    Donald Duck  08-10-2018 16:07:57       Out

before 09:16:52 there no one into the building

from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)

from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)

from 15:06:42 to 15:15:48 there is 1 person

from 15:15:49 to 16:07:57 there are 2 again

from 16:07:58 to 16:42:40 there is 1 again

from 16:42:41 to the end of the day there are none

I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.

Any help is appreciated

thanks a lot

giorgio

===============
UPDATE:===============

@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.

So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.

Could you help me in putting all together?
Thanks a lot

giorgio

edited Nov 28 '18 at 11:48

asked Nov 27 '18 at 14:35

GioCo

133

I'm struggling in finding a "simple" way to perform this analysis with Pandas:

I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.

       Full Name                 Time Direction

0  Uncle Scrooge  08-10-2018 09:16:52        In

1  Uncle Scrooge  08-10-2018 16:42:40       Out

2    Donald Duck  08-10-2018 15:04:07        In

3    Donald Duck  08-10-2018 15:06:42       Out

4    Donald Duck  08-10-2018 15:15:49        In

5    Donald Duck  08-10-2018 16:07:57       Out

before 09:16:52 there no one into the building

from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)

from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)

from 15:06:42 to 15:15:48 there is 1 person

from 15:15:49 to 16:07:57 there are 2 again

from 16:07:58 to 16:42:40 there is 1 again

from 16:42:41 to the end of the day there are none

I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.

Any help is appreciated

thanks a lot

giorgio

===============
UPDATE:===============

@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.

So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.

Could you help me in putting all together?
Thanks a lot

giorgio

python pandas datetime

edited Nov 28 '18 at 11:48

asked Nov 27 '18 at 14:35

GioCo

133

edited Nov 28 '18 at 11:48

asked Nov 27 '18 at 14:35

GioCo

133

edited Nov 28 '18 at 11:48

asked Nov 27 '18 at 14:35

GioCo

133

asked Nov 27 '18 at 14:35

GioCo

133

asked Nov 27 '18 at 14:35

GioCo

133

Please share the actual dataframe instead of an image

– yatu
Nov 27 '18 at 14:36

1

The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

– ALollz
Nov 27 '18 at 14:50

add a comment |

Please share the actual dataframe instead of an image

– yatu
Nov 27 '18 at 14:36

1

The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

– ALollz
Nov 27 '18 at 14:50

Please share the actual dataframe instead of an image

– yatu
Nov 27 '18 at 14:36

The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

– ALollz
Nov 27 '18 at 14:50

add a comment |

1 Answer
1

active

oldest

votes

You can start by setting the Time column as index, and sorting it using sort_index:

df = df.set_index('Time').sort_index()

print(df)



                Direction      Full Name

Time                                        

2018-08-10 09:16:52        In  Uncle Scrooge

2018-08-10 15:04:07        In    Donald Duck

2018-08-10 15:06:42       Out    Donald Duck

2018-08-10 15:15:49        In    Donald Duck

2018-08-10 16:07:57       Out    Donald Duck

2018-08-10 16:42:40       Out  Uncle Scrooge

And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:

mapper = {'In':1, 'Out':-1}

df = df.assign(Direction_mapped = df.Direction.map(mapper))

Which would give you:

                   Direction      Full Name  Direction_mapped

Time                                                          

2018-08-10 09:16:52        In  Uncle Scrooge                 1

2018-08-10 15:04:07        In    Donald Duck                 1

2018-08-10 15:06:42       Out    Donald Duck                -1

2018-08-10 15:15:49        In    Donald Duck                 1

2018-08-10 16:07:57       Out    Donald Duck                -1

2018-08-10 16:42:40       Out  Uncle Scrooge                -1

Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:

df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)

Which yields:

                       Direction  Full Name  n_people

Time                                                  

2018-08-10 09:16:52        In  Uncle Scrooge         1

2018-08-10 15:04:07        In    Donald Duck         2

2018-08-10 15:06:42       Out    Donald Duck         1

2018-08-10 15:15:49        In    Donald Duck         2

2018-08-10 16:07:57       Out    Donald Duck         1

2018-08-10 16:42:40       Out  Uncle Scrooge         0

General solution

A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:

df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',

                                 'Someone else', 'Someone else'],

                   'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49', 

                            '08-11-2018 10:42:40', '08-11-2018 10:48:40'],

                  'Direction': ['In','Out','In','Out', 'In','In', 'Out']})

print(df)



     Full Name          Time               Direction

0  Uncle Scrooge  08-10-2018 09:16:52        In

1  Uncle Scrooge  08-10-2018 16:42:40       Out

2    Donald Duck  08-10-2018 15:04:07        In

3    Donald Duck  08-10-2018 15:06:42       Out

4    Donald Duck  08-10-2018 15:15:49        In

5   Someone else  08-11-2018 10:42:40        In

6   Someone else  08-11-2018 10:48:40       Out

First the previous functionality can be encapsulated in a function

def apply_by_day(x):

    mapper = {'In':1, 'Out':-1}

    x = x.assign(Direction_mapped = x.Direction.map(mapper))

    x = x.assign(n_people = x.Direction_mapped.cumsum())

         .drop(['Direction_mapped'], axis = 1)

    return x

And then apply_by_day can be applied on daily groups using pandas.Grouper:

df.Time = pd.to_datetime(df.Time)

df = df.set_index('Time').sort_index()

df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))



                                 Full Name       Direction  n_people

Time       Time                                                  

2018-08-10 2018-08-10 09:16:52  Uncle Scrooge        In         1

           2018-08-10 15:04:07    Donald Duck        In         2

           2018-08-10 15:06:42    Donald Duck       Out         1

           2018-08-10 15:15:49    Donald Duck        In         2

           2018-08-10 16:42:40  Uncle Scrooge       Out         1

2018-08-11 2018-08-11 10:42:40   Someone else        In         1

           2018-08-11 10:48:40   Someone else       Out         0

As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.

edited Nov 29 '18 at 11:13

answered Nov 27 '18 at 16:46

yatu

11.8k31238

Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

– GioCo
Nov 28 '18 at 20:35

Okay @GioCo , this should do!

– yatu
Nov 28 '18 at 21:02

Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

– GioCo
Nov 29 '18 at 16:36

It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

– yatu
Nov 29 '18 at 16:36

1

Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

– GioCo
Dec 3 '18 at 11:01

|
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53502036%2fcounting-amount-of-people-in-building-over-time%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You can start by setting the Time column as index, and sorting it using sort_index:

df = df.set_index('Time').sort_index()

print(df)



                Direction      Full Name

Time                                        

2018-08-10 09:16:52        In  Uncle Scrooge

2018-08-10 15:04:07        In    Donald Duck

2018-08-10 15:06:42       Out    Donald Duck

2018-08-10 15:15:49        In    Donald Duck

2018-08-10 16:07:57       Out    Donald Duck

2018-08-10 16:42:40       Out  Uncle Scrooge

And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:

mapper = {'In':1, 'Out':-1}

df = df.assign(Direction_mapped = df.Direction.map(mapper))

Which would give you:

                   Direction      Full Name  Direction_mapped

Time                                                          

2018-08-10 09:16:52        In  Uncle Scrooge                 1

2018-08-10 15:04:07        In    Donald Duck                 1

2018-08-10 15:06:42       Out    Donald Duck                -1

2018-08-10 15:15:49        In    Donald Duck                 1

2018-08-10 16:07:57       Out    Donald Duck                -1

2018-08-10 16:42:40       Out  Uncle Scrooge                -1

Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:

df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)

Which yields:

                       Direction  Full Name  n_people

Time                                                  

2018-08-10 09:16:52        In  Uncle Scrooge         1

2018-08-10 15:04:07        In    Donald Duck         2

2018-08-10 15:06:42       Out    Donald Duck         1

2018-08-10 15:15:49        In    Donald Duck         2

2018-08-10 16:07:57       Out    Donald Duck         1

2018-08-10 16:42:40       Out  Uncle Scrooge         0

General solution

df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',

                                 'Someone else', 'Someone else'],

                   'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49', 

                            '08-11-2018 10:42:40', '08-11-2018 10:48:40'],

                  'Direction': ['In','Out','In','Out', 'In','In', 'Out']})

print(df)



     Full Name          Time               Direction

0  Uncle Scrooge  08-10-2018 09:16:52        In

1  Uncle Scrooge  08-10-2018 16:42:40       Out

2    Donald Duck  08-10-2018 15:04:07        In

3    Donald Duck  08-10-2018 15:06:42       Out

4    Donald Duck  08-10-2018 15:15:49        In

5   Someone else  08-11-2018 10:42:40        In

6   Someone else  08-11-2018 10:48:40       Out

First the previous functionality can be encapsulated in a function

def apply_by_day(x):

    mapper = {'In':1, 'Out':-1}

    x = x.assign(Direction_mapped = x.Direction.map(mapper))

    x = x.assign(n_people = x.Direction_mapped.cumsum())

         .drop(['Direction_mapped'], axis = 1)

    return x

And then apply_by_day can be applied on daily groups using pandas.Grouper:

df.Time = pd.to_datetime(df.Time)

df = df.set_index('Time').sort_index()

df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))



                                 Full Name       Direction  n_people

Time       Time                                                  

2018-08-10 2018-08-10 09:16:52  Uncle Scrooge        In         1

           2018-08-10 15:04:07    Donald Duck        In         2

           2018-08-10 15:06:42    Donald Duck       Out         1

           2018-08-10 15:15:49    Donald Duck        In         2

           2018-08-10 16:42:40  Uncle Scrooge       Out         1

2018-08-11 2018-08-11 10:42:40   Someone else        In         1

           2018-08-11 10:48:40   Someone else       Out         0

edited Nov 29 '18 at 11:13

answered Nov 27 '18 at 16:46

yatu

11.8k31238

Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

– GioCo
Nov 28 '18 at 20:35

Okay @GioCo , this should do!

– yatu
Nov 28 '18 at 21:02

Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

– GioCo
Nov 29 '18 at 16:36

It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

– yatu
Nov 29 '18 at 16:36

1

Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

– GioCo
Dec 3 '18 at 11:01

|
show 2 more comments

You can start by setting the Time column as index, and sorting it using sort_index:

df = df.set_index('Time').sort_index()

print(df)



                Direction      Full Name

Time                                        

2018-08-10 09:16:52        In  Uncle Scrooge

2018-08-10 15:04:07        In    Donald Duck

2018-08-10 15:06:42       Out    Donald Duck

2018-08-10 15:15:49        In    Donald Duck

2018-08-10 16:07:57       Out    Donald Duck

2018-08-10 16:42:40       Out  Uncle Scrooge

And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:

mapper = {'In':1, 'Out':-1}

df = df.assign(Direction_mapped = df.Direction.map(mapper))

Which would give you:

                   Direction      Full Name  Direction_mapped

Time                                                          

2018-08-10 09:16:52        In  Uncle Scrooge                 1

2018-08-10 15:04:07        In    Donald Duck                 1

2018-08-10 15:06:42       Out    Donald Duck                -1

2018-08-10 15:15:49        In    Donald Duck                 1

2018-08-10 16:07:57       Out    Donald Duck                -1

2018-08-10 16:42:40       Out  Uncle Scrooge                -1

Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:

df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)

Which yields:

                       Direction  Full Name  n_people

Time                                                  

2018-08-10 09:16:52        In  Uncle Scrooge         1

2018-08-10 15:04:07        In    Donald Duck         2

2018-08-10 15:06:42       Out    Donald Duck         1

2018-08-10 15:15:49        In    Donald Duck         2

2018-08-10 16:07:57       Out    Donald Duck         1

2018-08-10 16:42:40       Out  Uncle Scrooge         0

General solution

df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',

                                 'Someone else', 'Someone else'],

                   'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49', 

                            '08-11-2018 10:42:40', '08-11-2018 10:48:40'],

                  'Direction': ['In','Out','In','Out', 'In','In', 'Out']})

print(df)



     Full Name          Time               Direction

0  Uncle Scrooge  08-10-2018 09:16:52        In

1  Uncle Scrooge  08-10-2018 16:42:40       Out

2    Donald Duck  08-10-2018 15:04:07        In

3    Donald Duck  08-10-2018 15:06:42       Out

4    Donald Duck  08-10-2018 15:15:49        In

5   Someone else  08-11-2018 10:42:40        In

6   Someone else  08-11-2018 10:48:40       Out

First the previous functionality can be encapsulated in a function

def apply_by_day(x):

    mapper = {'In':1, 'Out':-1}

    x = x.assign(Direction_mapped = x.Direction.map(mapper))

    x = x.assign(n_people = x.Direction_mapped.cumsum())

         .drop(['Direction_mapped'], axis = 1)

    return x

And then apply_by_day can be applied on daily groups using pandas.Grouper:

df.Time = pd.to_datetime(df.Time)

df = df.set_index('Time').sort_index()

df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))



                                 Full Name       Direction  n_people

Time       Time                                                  

2018-08-10 2018-08-10 09:16:52  Uncle Scrooge        In         1

           2018-08-10 15:04:07    Donald Duck        In         2

           2018-08-10 15:06:42    Donald Duck       Out         1

           2018-08-10 15:15:49    Donald Duck        In         2

           2018-08-10 16:42:40  Uncle Scrooge       Out         1

2018-08-11 2018-08-11 10:42:40   Someone else        In         1

           2018-08-11 10:48:40   Someone else       Out         0

edited Nov 29 '18 at 11:13

answered Nov 27 '18 at 16:46

yatu

11.8k31238

Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

– GioCo
Nov 28 '18 at 20:35

Okay @GioCo , this should do!

– yatu
Nov 28 '18 at 21:02

Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

– GioCo
Nov 29 '18 at 16:36

It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

– yatu
Nov 29 '18 at 16:36

1

Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

– GioCo
Dec 3 '18 at 11:01

|
show 2 more comments

You can start by setting the Time column as index, and sorting it using sort_index:

df = df.set_index('Time').sort_index()

print(df)



                Direction      Full Name

Time                                        

2018-08-10 09:16:52        In  Uncle Scrooge

2018-08-10 15:04:07        In    Donald Duck

2018-08-10 15:06:42       Out    Donald Duck

2018-08-10 15:15:49        In    Donald Duck

2018-08-10 16:07:57       Out    Donald Duck

2018-08-10 16:42:40       Out  Uncle Scrooge

And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:

mapper = {'In':1, 'Out':-1}

df = df.assign(Direction_mapped = df.Direction.map(mapper))

Which would give you:

                   Direction      Full Name  Direction_mapped

Time                                                          

2018-08-10 09:16:52        In  Uncle Scrooge                 1

2018-08-10 15:04:07        In    Donald Duck                 1

2018-08-10 15:06:42       Out    Donald Duck                -1

2018-08-10 15:15:49        In    Donald Duck                 1

2018-08-10 16:07:57       Out    Donald Duck                -1

2018-08-10 16:42:40       Out  Uncle Scrooge                -1

Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:

df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)

Which yields:

                       Direction  Full Name  n_people

Time                                                  

2018-08-10 09:16:52        In  Uncle Scrooge         1

2018-08-10 15:04:07        In    Donald Duck         2

2018-08-10 15:06:42       Out    Donald Duck         1

2018-08-10 15:15:49        In    Donald Duck         2

2018-08-10 16:07:57       Out    Donald Duck         1

2018-08-10 16:42:40       Out  Uncle Scrooge         0

General solution

df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',

                                 'Someone else', 'Someone else'],

                   'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49', 

                            '08-11-2018 10:42:40', '08-11-2018 10:48:40'],

                  'Direction': ['In','Out','In','Out', 'In','In', 'Out']})

print(df)



     Full Name          Time               Direction

0  Uncle Scrooge  08-10-2018 09:16:52        In

1  Uncle Scrooge  08-10-2018 16:42:40       Out

2    Donald Duck  08-10-2018 15:04:07        In

3    Donald Duck  08-10-2018 15:06:42       Out

4    Donald Duck  08-10-2018 15:15:49        In

5   Someone else  08-11-2018 10:42:40        In

6   Someone else  08-11-2018 10:48:40       Out

First the previous functionality can be encapsulated in a function

def apply_by_day(x):

    mapper = {'In':1, 'Out':-1}

    x = x.assign(Direction_mapped = x.Direction.map(mapper))

    x = x.assign(n_people = x.Direction_mapped.cumsum())

         .drop(['Direction_mapped'], axis = 1)

    return x

And then apply_by_day can be applied on daily groups using pandas.Grouper:

df.Time = pd.to_datetime(df.Time)

df = df.set_index('Time').sort_index()

df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))



                                 Full Name       Direction  n_people

Time       Time                                                  

2018-08-10 2018-08-10 09:16:52  Uncle Scrooge        In         1

           2018-08-10 15:04:07    Donald Duck        In         2

           2018-08-10 15:06:42    Donald Duck       Out         1

           2018-08-10 15:15:49    Donald Duck        In         2

           2018-08-10 16:42:40  Uncle Scrooge       Out         1

2018-08-11 2018-08-11 10:42:40   Someone else        In         1

           2018-08-11 10:48:40   Someone else       Out         0

edited Nov 29 '18 at 11:13

answered Nov 27 '18 at 16:46

yatu

11.8k31238

You can start by setting the Time column as index, and sorting it using sort_index:

df = df.set_index('Time').sort_index()

print(df)



                Direction      Full Name

Time                                        

2018-08-10 09:16:52        In  Uncle Scrooge

2018-08-10 15:04:07        In    Donald Duck

2018-08-10 15:06:42       Out    Donald Duck

2018-08-10 15:15:49        In    Donald Duck

2018-08-10 16:07:57       Out    Donald Duck

2018-08-10 16:42:40       Out  Uncle Scrooge

And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:

mapper = {'In':1, 'Out':-1}

df = df.assign(Direction_mapped = df.Direction.map(mapper))

Which would give you:

                   Direction      Full Name  Direction_mapped

Time                                                          

2018-08-10 09:16:52        In  Uncle Scrooge                 1

2018-08-10 15:04:07        In    Donald Duck                 1

2018-08-10 15:06:42       Out    Donald Duck                -1

2018-08-10 15:15:49        In    Donald Duck                 1

2018-08-10 16:07:57       Out    Donald Duck                -1

2018-08-10 16:42:40       Out  Uncle Scrooge                -1

Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:

df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)

Which yields:

                       Direction  Full Name  n_people

Time                                                  

2018-08-10 09:16:52        In  Uncle Scrooge         1

2018-08-10 15:04:07        In    Donald Duck         2

2018-08-10 15:06:42       Out    Donald Duck         1

2018-08-10 15:15:49        In    Donald Duck         2

2018-08-10 16:07:57       Out    Donald Duck         1

2018-08-10 16:42:40       Out  Uncle Scrooge         0

General solution

df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',

                                 'Someone else', 'Someone else'],

                   'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49', 

                            '08-11-2018 10:42:40', '08-11-2018 10:48:40'],

                  'Direction': ['In','Out','In','Out', 'In','In', 'Out']})

print(df)



     Full Name          Time               Direction

0  Uncle Scrooge  08-10-2018 09:16:52        In

1  Uncle Scrooge  08-10-2018 16:42:40       Out

2    Donald Duck  08-10-2018 15:04:07        In

3    Donald Duck  08-10-2018 15:06:42       Out

4    Donald Duck  08-10-2018 15:15:49        In

5   Someone else  08-11-2018 10:42:40        In

6   Someone else  08-11-2018 10:48:40       Out

First the previous functionality can be encapsulated in a function

def apply_by_day(x):

    mapper = {'In':1, 'Out':-1}

    x = x.assign(Direction_mapped = x.Direction.map(mapper))

    x = x.assign(n_people = x.Direction_mapped.cumsum())

         .drop(['Direction_mapped'], axis = 1)

    return x

And then apply_by_day can be applied on daily groups using pandas.Grouper:

df.Time = pd.to_datetime(df.Time)

df = df.set_index('Time').sort_index()

df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))



                                 Full Name       Direction  n_people

Time       Time                                                  

2018-08-10 2018-08-10 09:16:52  Uncle Scrooge        In         1

           2018-08-10 15:04:07    Donald Duck        In         2

           2018-08-10 15:06:42    Donald Duck       Out         1

           2018-08-10 15:15:49    Donald Duck        In         2

           2018-08-10 16:42:40  Uncle Scrooge       Out         1

2018-08-11 2018-08-11 10:42:40   Someone else        In         1

           2018-08-11 10:48:40   Someone else       Out         0

edited Nov 29 '18 at 11:13

answered Nov 27 '18 at 16:46

yatu

11.8k31238

edited Nov 29 '18 at 11:13

answered Nov 27 '18 at 16:46

yatu

11.8k31238

answered Nov 27 '18 at 16:46

yatu

11.8k31238

answered Nov 27 '18 at 16:46

yatu

11.8k31238

Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

– GioCo
Nov 28 '18 at 20:35

Okay @GioCo , this should do!

– yatu
Nov 28 '18 at 21:02

Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

– GioCo
Nov 29 '18 at 16:36

It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

– yatu
Nov 29 '18 at 16:36

1

Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

– GioCo
Dec 3 '18 at 11:01

|
show 2 more comments

Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

– GioCo
Nov 28 '18 at 20:35

Okay @GioCo , this should do!

– yatu
Nov 28 '18 at 21:02

Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

– GioCo
Nov 29 '18 at 16:36

It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

– yatu
Nov 29 '18 at 16:36

1

Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

– GioCo
Dec 3 '18 at 11:01

Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

– GioCo
Nov 28 '18 at 20:35

Okay @GioCo , this should do!

– yatu
Nov 28 '18 at 21:02

Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

– GioCo
Nov 29 '18 at 16:36

It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

– yatu
Nov 29 '18 at 16:36

Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

– GioCo
Dec 3 '18 at 11:01

|
show 2 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl