Counting amount of people in building over time












2















I'm struggling in finding a "simple" way to perform this analysis with Pandas:



I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.



       Full Name                 Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Donald Duck 08-10-2018 16:07:57 Out


My ideal final result is showing (in a tabular or better graphical way) how the total number of people into the building changes over the time.
So going back to the sample data I provided, I'd like to show that during the day 08-10-2018:




  • before 09:16:52 there no one into the building

  • from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)

  • from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)

  • from 15:06:42 to 15:15:48 there is 1 person

  • from 15:15:49 to 16:07:57 there are 2 again

  • from 16:07:58 to 16:42:40 there is 1 again

  • from 16:42:41 to the end of the day there are none


I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.



Any help is appreciated



thanks a lot



giorgio



===============
UPDATE:===============



@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.



Infact, as I mentioned, I'm working with data spanning a period of 2 months.
Moreover, for some reason, it seems that not all the people entering the building have been tracked when exiting it.
So with the cumsum() function, I find the total number of people of a day being influenced by the people of the day before and so on,
That shows an unjustifiable high number of people into the building during early and late hours of any days apart form the very first ones.



So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.



Could you help me in putting all together?
Thanks a lot



giorgio










share|improve this question

























  • Please share the actual dataframe instead of an image

    – yatu
    Nov 27 '18 at 14:36






  • 1





    The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

    – ALollz
    Nov 27 '18 at 14:50


















2















I'm struggling in finding a "simple" way to perform this analysis with Pandas:



I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.



       Full Name                 Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Donald Duck 08-10-2018 16:07:57 Out


My ideal final result is showing (in a tabular or better graphical way) how the total number of people into the building changes over the time.
So going back to the sample data I provided, I'd like to show that during the day 08-10-2018:




  • before 09:16:52 there no one into the building

  • from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)

  • from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)

  • from 15:06:42 to 15:15:48 there is 1 person

  • from 15:15:49 to 16:07:57 there are 2 again

  • from 16:07:58 to 16:42:40 there is 1 again

  • from 16:42:41 to the end of the day there are none


I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.



Any help is appreciated



thanks a lot



giorgio



===============
UPDATE:===============



@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.



Infact, as I mentioned, I'm working with data spanning a period of 2 months.
Moreover, for some reason, it seems that not all the people entering the building have been tracked when exiting it.
So with the cumsum() function, I find the total number of people of a day being influenced by the people of the day before and so on,
That shows an unjustifiable high number of people into the building during early and late hours of any days apart form the very first ones.



So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.



Could you help me in putting all together?
Thanks a lot



giorgio










share|improve this question

























  • Please share the actual dataframe instead of an image

    – yatu
    Nov 27 '18 at 14:36






  • 1





    The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

    – ALollz
    Nov 27 '18 at 14:50
















2












2








2


1






I'm struggling in finding a "simple" way to perform this analysis with Pandas:



I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.



       Full Name                 Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Donald Duck 08-10-2018 16:07:57 Out


My ideal final result is showing (in a tabular or better graphical way) how the total number of people into the building changes over the time.
So going back to the sample data I provided, I'd like to show that during the day 08-10-2018:




  • before 09:16:52 there no one into the building

  • from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)

  • from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)

  • from 15:06:42 to 15:15:48 there is 1 person

  • from 15:15:49 to 16:07:57 there are 2 again

  • from 16:07:58 to 16:42:40 there is 1 again

  • from 16:42:41 to the end of the day there are none


I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.



Any help is appreciated



thanks a lot



giorgio



===============
UPDATE:===============



@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.



Infact, as I mentioned, I'm working with data spanning a period of 2 months.
Moreover, for some reason, it seems that not all the people entering the building have been tracked when exiting it.
So with the cumsum() function, I find the total number of people of a day being influenced by the people of the day before and so on,
That shows an unjustifiable high number of people into the building during early and late hours of any days apart form the very first ones.



So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.



Could you help me in putting all together?
Thanks a lot



giorgio










share|improve this question
















I'm struggling in finding a "simple" way to perform this analysis with Pandas:



I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.



       Full Name                 Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Donald Duck 08-10-2018 16:07:57 Out


My ideal final result is showing (in a tabular or better graphical way) how the total number of people into the building changes over the time.
So going back to the sample data I provided, I'd like to show that during the day 08-10-2018:




  • before 09:16:52 there no one into the building

  • from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)

  • from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)

  • from 15:06:42 to 15:15:48 there is 1 person

  • from 15:15:49 to 16:07:57 there are 2 again

  • from 16:07:58 to 16:42:40 there is 1 again

  • from 16:42:41 to the end of the day there are none


I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.



Any help is appreciated



thanks a lot



giorgio



===============
UPDATE:===============



@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.



Infact, as I mentioned, I'm working with data spanning a period of 2 months.
Moreover, for some reason, it seems that not all the people entering the building have been tracked when exiting it.
So with the cumsum() function, I find the total number of people of a day being influenced by the people of the day before and so on,
That shows an unjustifiable high number of people into the building during early and late hours of any days apart form the very first ones.



So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.



Could you help me in putting all together?
Thanks a lot



giorgio







python pandas datetime






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 28 '18 at 11:48







GioCo

















asked Nov 27 '18 at 14:35









GioCoGioCo

133




133













  • Please share the actual dataframe instead of an image

    – yatu
    Nov 27 '18 at 14:36






  • 1





    The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

    – ALollz
    Nov 27 '18 at 14:50





















  • Please share the actual dataframe instead of an image

    – yatu
    Nov 27 '18 at 14:36






  • 1





    The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

    – ALollz
    Nov 27 '18 at 14:50



















Please share the actual dataframe instead of an image

– yatu
Nov 27 '18 at 14:36





Please share the actual dataframe instead of an image

– yatu
Nov 27 '18 at 14:36




1




1





The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

– ALollz
Nov 27 '18 at 14:50







The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?

– ALollz
Nov 27 '18 at 14:50














1 Answer
1






active

oldest

votes


















0














You can start by setting the Time column as index, and sorting it using sort_index:



df = df.set_index('Time').sort_index()
print(df)

Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge


And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:



mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))


Which would give you:



                   Direction      Full Name  Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1


Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:



df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)


Which yields:



                       Direction  Full Name  n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0




General solution



A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:



df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)

Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out


First the previous functionality can be encapsulated in a function



def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x


And then apply_by_day can be applied on daily groups using pandas.Grouper:



df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))

Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0


As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.






share|improve this answer


























  • Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

    – GioCo
    Nov 28 '18 at 20:35











  • Okay @GioCo , this should do!

    – yatu
    Nov 28 '18 at 21:02











  • Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

    – GioCo
    Nov 29 '18 at 16:36











  • It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

    – yatu
    Nov 29 '18 at 16:36








  • 1





    Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

    – GioCo
    Dec 3 '18 at 11:01











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53502036%2fcounting-amount-of-people-in-building-over-time%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














You can start by setting the Time column as index, and sorting it using sort_index:



df = df.set_index('Time').sort_index()
print(df)

Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge


And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:



mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))


Which would give you:



                   Direction      Full Name  Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1


Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:



df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)


Which yields:



                       Direction  Full Name  n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0




General solution



A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:



df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)

Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out


First the previous functionality can be encapsulated in a function



def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x


And then apply_by_day can be applied on daily groups using pandas.Grouper:



df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))

Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0


As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.






share|improve this answer


























  • Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

    – GioCo
    Nov 28 '18 at 20:35











  • Okay @GioCo , this should do!

    – yatu
    Nov 28 '18 at 21:02











  • Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

    – GioCo
    Nov 29 '18 at 16:36











  • It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

    – yatu
    Nov 29 '18 at 16:36








  • 1





    Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

    – GioCo
    Dec 3 '18 at 11:01
















0














You can start by setting the Time column as index, and sorting it using sort_index:



df = df.set_index('Time').sort_index()
print(df)

Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge


And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:



mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))


Which would give you:



                   Direction      Full Name  Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1


Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:



df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)


Which yields:



                       Direction  Full Name  n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0




General solution



A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:



df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)

Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out


First the previous functionality can be encapsulated in a function



def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x


And then apply_by_day can be applied on daily groups using pandas.Grouper:



df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))

Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0


As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.






share|improve this answer


























  • Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

    – GioCo
    Nov 28 '18 at 20:35











  • Okay @GioCo , this should do!

    – yatu
    Nov 28 '18 at 21:02











  • Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

    – GioCo
    Nov 29 '18 at 16:36











  • It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

    – yatu
    Nov 29 '18 at 16:36








  • 1





    Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

    – GioCo
    Dec 3 '18 at 11:01














0












0








0







You can start by setting the Time column as index, and sorting it using sort_index:



df = df.set_index('Time').sort_index()
print(df)

Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge


And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:



mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))


Which would give you:



                   Direction      Full Name  Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1


Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:



df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)


Which yields:



                       Direction  Full Name  n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0




General solution



A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:



df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)

Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out


First the previous functionality can be encapsulated in a function



def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x


And then apply_by_day can be applied on daily groups using pandas.Grouper:



df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))

Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0


As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.






share|improve this answer















You can start by setting the Time column as index, and sorting it using sort_index:



df = df.set_index('Time').sort_index()
print(df)

Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge


And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}:



mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))


Which would give you:



                   Direction      Full Name  Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1


Having mapped the Direction column, you can simply apply cumsum to the result, which will give you the amount of people from a specific time onwards:



df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)


Which yields:



                       Direction  Full Name  n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0




General solution



A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:



df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)

Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out


First the previous functionality can be encapsulated in a function



def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x


And then apply_by_day can be applied on daily groups using pandas.Grouper:



df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))

Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0


As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 29 '18 at 11:13

























answered Nov 27 '18 at 16:46









yatuyatu

11.8k31238




11.8k31238













  • Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

    – GioCo
    Nov 28 '18 at 20:35











  • Okay @GioCo , this should do!

    – yatu
    Nov 28 '18 at 21:02











  • Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

    – GioCo
    Nov 29 '18 at 16:36











  • It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

    – yatu
    Nov 29 '18 at 16:36








  • 1





    Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

    – GioCo
    Dec 3 '18 at 11:01



















  • Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

    – GioCo
    Nov 28 '18 at 20:35











  • Okay @GioCo , this should do!

    – yatu
    Nov 28 '18 at 21:02











  • Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

    – GioCo
    Nov 29 '18 at 16:36











  • It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

    – yatu
    Nov 29 '18 at 16:36








  • 1





    Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

    – GioCo
    Dec 3 '18 at 11:01

















Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

– GioCo
Nov 28 '18 at 20:35





Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?

– GioCo
Nov 28 '18 at 20:35













Okay @GioCo , this should do!

– yatu
Nov 28 '18 at 21:02





Okay @GioCo , this should do!

– yatu
Nov 28 '18 at 21:02













Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

– GioCo
Nov 29 '18 at 16:36





Unfortunately it's not working: 1st i received this error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index', so I think I solved using pd.to_datetime () (but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x) doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx

– GioCo
Nov 29 '18 at 16:36













It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

– yatu
Nov 29 '18 at 16:36







It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.

– yatu
Nov 29 '18 at 16:36






1




1





Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

– GioCo
Dec 3 '18 at 11:01





Sorry guys it was totally my fault... I forgot to set this assignment dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x)) and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter'). Sorry to have bugged you with my last comments, and thanks again for your help

– GioCo
Dec 3 '18 at 11:01




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53502036%2fcounting-amount-of-people-in-building-over-time%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)