Counting amount of people in building over time
I'm struggling in finding a "simple" way to perform this analysis with Pandas:
I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.
Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Donald Duck 08-10-2018 16:07:57 Out
My ideal final result is showing (in a tabular or better graphical way) how the total number of people into the building changes over the time.
So going back to the sample data I provided, I'd like to show that during the day 08-10-2018:
- before 09:16:52 there no one into the building
- from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)
- from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)
- from 15:06:42 to 15:15:48 there is 1 person
- from 15:15:49 to 16:07:57 there are 2 again
- from 16:07:58 to 16:42:40 there is 1 again
- from 16:42:41 to the end of the day there are none
I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.
Any help is appreciated
thanks a lot
giorgio
===============
UPDATE:===============
@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.
Infact, as I mentioned, I'm working with data spanning a period of 2 months.
Moreover, for some reason, it seems that not all the people entering the building have been tracked when exiting it.
So with the cumsum() function, I find the total number of people of a day being influenced by the people of the day before and so on,
That shows an unjustifiable high number of people into the building during early and late hours of any days apart form the very first ones.
So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.
Could you help me in putting all together?
Thanks a lot
giorgio
python pandas datetime
add a comment |
I'm struggling in finding a "simple" way to perform this analysis with Pandas:
I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.
Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Donald Duck 08-10-2018 16:07:57 Out
My ideal final result is showing (in a tabular or better graphical way) how the total number of people into the building changes over the time.
So going back to the sample data I provided, I'd like to show that during the day 08-10-2018:
- before 09:16:52 there no one into the building
- from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)
- from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)
- from 15:06:42 to 15:15:48 there is 1 person
- from 15:15:49 to 16:07:57 there are 2 again
- from 16:07:58 to 16:42:40 there is 1 again
- from 16:42:41 to the end of the day there are none
I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.
Any help is appreciated
thanks a lot
giorgio
===============
UPDATE:===============
@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.
Infact, as I mentioned, I'm working with data spanning a period of 2 months.
Moreover, for some reason, it seems that not all the people entering the building have been tracked when exiting it.
So with the cumsum() function, I find the total number of people of a day being influenced by the people of the day before and so on,
That shows an unjustifiable high number of people into the building during early and late hours of any days apart form the very first ones.
So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.
Could you help me in putting all together?
Thanks a lot
giorgio
python pandas datetime
Please share the actual dataframe instead of an image
– yatu
Nov 27 '18 at 14:36
1
The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?
– ALollz
Nov 27 '18 at 14:50
add a comment |
I'm struggling in finding a "simple" way to perform this analysis with Pandas:
I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.
Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Donald Duck 08-10-2018 16:07:57 Out
My ideal final result is showing (in a tabular or better graphical way) how the total number of people into the building changes over the time.
So going back to the sample data I provided, I'd like to show that during the day 08-10-2018:
- before 09:16:52 there no one into the building
- from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)
- from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)
- from 15:06:42 to 15:15:48 there is 1 person
- from 15:15:49 to 16:07:57 there are 2 again
- from 16:07:58 to 16:42:40 there is 1 again
- from 16:42:41 to the end of the day there are none
I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.
Any help is appreciated
thanks a lot
giorgio
===============
UPDATE:===============
@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.
Infact, as I mentioned, I'm working with data spanning a period of 2 months.
Moreover, for some reason, it seems that not all the people entering the building have been tracked when exiting it.
So with the cumsum() function, I find the total number of people of a day being influenced by the people of the day before and so on,
That shows an unjustifiable high number of people into the building during early and late hours of any days apart form the very first ones.
So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.
Could you help me in putting all together?
Thanks a lot
giorgio
python pandas datetime
I'm struggling in finding a "simple" way to perform this analysis with Pandas:
I have xlsx files that show the transits of people into a building.
Here after I show a simplified version of my raw data.
Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Donald Duck 08-10-2018 16:07:57 Out
My ideal final result is showing (in a tabular or better graphical way) how the total number of people into the building changes over the time.
So going back to the sample data I provided, I'd like to show that during the day 08-10-2018:
- before 09:16:52 there no one into the building
- from 09:16:52 to 15:04:06 there 1 person (Uncle Scrooge)
- from 15:04:07 to 15:06:42 there are 2 people (Uncle Scrooge and Donald Duck)
- from 15:06:42 to 15:15:48 there is 1 person
- from 15:15:49 to 16:07:57 there are 2 again
- from 16:07:58 to 16:42:40 there is 1 again
- from 16:42:41 to the end of the day there are none
I used real data for that example, so you can see timestamps are accurate to the seconds, but I don't need to be that accurate, since that analysis have to be performed over a 2-months range data.
Any help is appreciated
thanks a lot
giorgio
===============
UPDATE:===============
@nixon and @ALollz thanks a lot you're awsome.
It' works perfectly, apart for a detail I dind't think about in my original question.
Infact, as I mentioned, I'm working with data spanning a period of 2 months.
Moreover, for some reason, it seems that not all the people entering the building have been tracked when exiting it.
So with the cumsum() function, I find the total number of people of a day being influenced by the people of the day before and so on,
That shows an unjustifiable high number of people into the building during early and late hours of any days apart form the very first ones.
So I was thinking it could be solved by first performing a group_by on days and then appling your suggestion.
Could you help me in putting all together?
Thanks a lot
giorgio
python pandas datetime
python pandas datetime
edited Nov 28 '18 at 11:48
GioCo
asked Nov 27 '18 at 14:35
GioCoGioCo
133
133
Please share the actual dataframe instead of an image
– yatu
Nov 27 '18 at 14:36
1
The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?
– ALollz
Nov 27 '18 at 14:50
add a comment |
Please share the actual dataframe instead of an image
– yatu
Nov 27 '18 at 14:36
1
The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?
– ALollz
Nov 27 '18 at 14:50
Please share the actual dataframe instead of an image
– yatu
Nov 27 '18 at 14:36
Please share the actual dataframe instead of an image
– yatu
Nov 27 '18 at 14:36
1
1
The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?
– ALollz
Nov 27 '18 at 14:50
The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?
– ALollz
Nov 27 '18 at 14:50
add a comment |
1 Answer
1
active
oldest
votes
You can start by setting the Time
column as index, and sorting it using sort_index
:
df = df.set_index('Time').sort_index()
print(df)
Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge
And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}
:
mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))
Which would give you:
Direction Full Name Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1
Having mapped the Direction column, you can simply apply cumsum
to the result, which will give you the amount of people from a specific time onwards:
df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)
Which yields:
Direction Full Name n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0
General solution
A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:
df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)
Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out
First the previous functionality can be encapsulated in a function
def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x
And then apply_by_day
can be applied on daily groups using pandas.Grouper
:
df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0
As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.
Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?
– GioCo
Nov 28 '18 at 20:35
Okay @GioCo , this should do!
– yatu
Nov 28 '18 at 21:02
Unfortunately it's not working: 1st i received this errorTypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
, so I think I solved usingpd.to_datetime ()
(but this problem didn't arise with the first solution). 2nd it seems functionapply_by_day(x)
doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx
– GioCo
Nov 29 '18 at 16:36
It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.
– yatu
Nov 29 '18 at 16:36
1
Sorry guys it was totally my fault... I forgot to set this assignmentdfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
and then justdfOut.to_excel('Output.xlsx', engine='xlsxwriter')
. Sorry to have bugged you with my last comments, and thanks again for your help
– GioCo
Dec 3 '18 at 11:01
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53502036%2fcounting-amount-of-people-in-building-over-time%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can start by setting the Time
column as index, and sorting it using sort_index
:
df = df.set_index('Time').sort_index()
print(df)
Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge
And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}
:
mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))
Which would give you:
Direction Full Name Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1
Having mapped the Direction column, you can simply apply cumsum
to the result, which will give you the amount of people from a specific time onwards:
df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)
Which yields:
Direction Full Name n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0
General solution
A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:
df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)
Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out
First the previous functionality can be encapsulated in a function
def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x
And then apply_by_day
can be applied on daily groups using pandas.Grouper
:
df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0
As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.
Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?
– GioCo
Nov 28 '18 at 20:35
Okay @GioCo , this should do!
– yatu
Nov 28 '18 at 21:02
Unfortunately it's not working: 1st i received this errorTypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
, so I think I solved usingpd.to_datetime ()
(but this problem didn't arise with the first solution). 2nd it seems functionapply_by_day(x)
doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx
– GioCo
Nov 29 '18 at 16:36
It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.
– yatu
Nov 29 '18 at 16:36
1
Sorry guys it was totally my fault... I forgot to set this assignmentdfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
and then justdfOut.to_excel('Output.xlsx', engine='xlsxwriter')
. Sorry to have bugged you with my last comments, and thanks again for your help
– GioCo
Dec 3 '18 at 11:01
|
show 2 more comments
You can start by setting the Time
column as index, and sorting it using sort_index
:
df = df.set_index('Time').sort_index()
print(df)
Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge
And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}
:
mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))
Which would give you:
Direction Full Name Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1
Having mapped the Direction column, you can simply apply cumsum
to the result, which will give you the amount of people from a specific time onwards:
df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)
Which yields:
Direction Full Name n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0
General solution
A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:
df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)
Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out
First the previous functionality can be encapsulated in a function
def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x
And then apply_by_day
can be applied on daily groups using pandas.Grouper
:
df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0
As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.
Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?
– GioCo
Nov 28 '18 at 20:35
Okay @GioCo , this should do!
– yatu
Nov 28 '18 at 21:02
Unfortunately it's not working: 1st i received this errorTypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
, so I think I solved usingpd.to_datetime ()
(but this problem didn't arise with the first solution). 2nd it seems functionapply_by_day(x)
doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx
– GioCo
Nov 29 '18 at 16:36
It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.
– yatu
Nov 29 '18 at 16:36
1
Sorry guys it was totally my fault... I forgot to set this assignmentdfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
and then justdfOut.to_excel('Output.xlsx', engine='xlsxwriter')
. Sorry to have bugged you with my last comments, and thanks again for your help
– GioCo
Dec 3 '18 at 11:01
|
show 2 more comments
You can start by setting the Time
column as index, and sorting it using sort_index
:
df = df.set_index('Time').sort_index()
print(df)
Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge
And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}
:
mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))
Which would give you:
Direction Full Name Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1
Having mapped the Direction column, you can simply apply cumsum
to the result, which will give you the amount of people from a specific time onwards:
df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)
Which yields:
Direction Full Name n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0
General solution
A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:
df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)
Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out
First the previous functionality can be encapsulated in a function
def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x
And then apply_by_day
can be applied on daily groups using pandas.Grouper
:
df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0
As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.
You can start by setting the Time
column as index, and sorting it using sort_index
:
df = df.set_index('Time').sort_index()
print(df)
Direction Full Name
Time
2018-08-10 09:16:52 In Uncle Scrooge
2018-08-10 15:04:07 In Donald Duck
2018-08-10 15:06:42 Out Donald Duck
2018-08-10 15:15:49 In Donald Duck
2018-08-10 16:07:57 Out Donald Duck
2018-08-10 16:42:40 Out Uncle Scrooge
And create a mapping (as @ALollz suggests) of {'In':1, 'Out':-1}
:
mapper = {'In':1, 'Out':-1}
df = df.assign(Direction_mapped = df.Direction.map(mapper))
Which would give you:
Direction Full Name Direction_mapped
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 1
2018-08-10 15:06:42 Out Donald Duck -1
2018-08-10 15:15:49 In Donald Duck 1
2018-08-10 16:07:57 Out Donald Duck -1
2018-08-10 16:42:40 Out Uncle Scrooge -1
Having mapped the Direction column, you can simply apply cumsum
to the result, which will give you the amount of people from a specific time onwards:
df = df.assign(n_people = df.Direction_mapped.cumsum()).drop(['Direction_mapped'], axis = 1)
Which yields:
Direction Full Name n_people
Time
2018-08-10 09:16:52 In Uncle Scrooge 1
2018-08-10 15:04:07 In Donald Duck 2
2018-08-10 15:06:42 Out Donald Duck 1
2018-08-10 15:15:49 In Donald Duck 2
2018-08-10 16:07:57 Out Donald Duck 1
2018-08-10 16:42:40 Out Uncle Scrooge 0
General solution
A more general solution for the case that not everyone is tracked leaving the building. Lets try with a new df which includes more than one day. Also lets simulate this time that Donald Duck does get in twice, but is not tracked getting out on the second time:
df = pd.DataFrame({'Full Name': ['Uncle Scrooge','Uncle Scrooge', 'Donald Duck', 'Donald Duck', 'Donald Duck',
'Someone else', 'Someone else'],
'Time': ['08-10-2018 09:16:52','08-10-2018 16:42:40', '08-10-2018 15:04:07', '08-10-2018 15:06:42', '08-10-2018 15:15:49',
'08-11-2018 10:42:40', '08-11-2018 10:48:40'],
'Direction': ['In','Out','In','Out', 'In','In', 'Out']})
print(df)
Full Name Time Direction
0 Uncle Scrooge 08-10-2018 09:16:52 In
1 Uncle Scrooge 08-10-2018 16:42:40 Out
2 Donald Duck 08-10-2018 15:04:07 In
3 Donald Duck 08-10-2018 15:06:42 Out
4 Donald Duck 08-10-2018 15:15:49 In
5 Someone else 08-11-2018 10:42:40 In
6 Someone else 08-11-2018 10:48:40 Out
First the previous functionality can be encapsulated in a function
def apply_by_day(x):
mapper = {'In':1, 'Out':-1}
x = x.assign(Direction_mapped = x.Direction.map(mapper))
x = x.assign(n_people = x.Direction_mapped.cumsum())
.drop(['Direction_mapped'], axis = 1)
return x
And then apply_by_day
can be applied on daily groups using pandas.Grouper
:
df.Time = pd.to_datetime(df.Time)
df = df.set_index('Time').sort_index()
df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
Full Name Direction n_people
Time Time
2018-08-10 2018-08-10 09:16:52 Uncle Scrooge In 1
2018-08-10 15:04:07 Donald Duck In 2
2018-08-10 15:06:42 Donald Duck Out 1
2018-08-10 15:15:49 Donald Duck In 2
2018-08-10 16:42:40 Uncle Scrooge Out 1
2018-08-11 2018-08-11 10:42:40 Someone else In 1
2018-08-11 10:48:40 Someone else Out 0
As the resulting dataframe shows, even though was not tracked out leaving the building on the 2018-08-10, the n_people starts from 0 on the following day, as defined function is applied for each day separately.
edited Nov 29 '18 at 11:13
answered Nov 27 '18 at 16:46
yatuyatu
11.8k31238
11.8k31238
Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?
– GioCo
Nov 28 '18 at 20:35
Okay @GioCo , this should do!
– yatu
Nov 28 '18 at 21:02
Unfortunately it's not working: 1st i received this errorTypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
, so I think I solved usingpd.to_datetime ()
(but this problem didn't arise with the first solution). 2nd it seems functionapply_by_day(x)
doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx
– GioCo
Nov 29 '18 at 16:36
It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.
– yatu
Nov 29 '18 at 16:36
1
Sorry guys it was totally my fault... I forgot to set this assignmentdfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
and then justdfOut.to_excel('Output.xlsx', engine='xlsxwriter')
. Sorry to have bugged you with my last comments, and thanks again for your help
– GioCo
Dec 3 '18 at 11:01
|
show 2 more comments
Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?
– GioCo
Nov 28 '18 at 20:35
Okay @GioCo , this should do!
– yatu
Nov 28 '18 at 21:02
Unfortunately it's not working: 1st i received this errorTypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
, so I think I solved usingpd.to_datetime ()
(but this problem didn't arise with the first solution). 2nd it seems functionapply_by_day(x)
doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx
– GioCo
Nov 29 '18 at 16:36
It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.
– yatu
Nov 29 '18 at 16:36
1
Sorry guys it was totally my fault... I forgot to set this assignmentdfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
and then justdfOut.to_excel('Output.xlsx', engine='xlsxwriter')
. Sorry to have bugged you with my last comments, and thanks again for your help
– GioCo
Dec 3 '18 at 11:01
Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?
– GioCo
Nov 28 '18 at 20:35
Tnx nixon and ALollz for your great help. I updated the question with a "little" problem with my raw data that is preventing me to fully take advantage of your suggestions. Could you please help me another time?
– GioCo
Nov 28 '18 at 20:35
Okay @GioCo , this should do!
– yatu
Nov 28 '18 at 21:02
Okay @GioCo , this should do!
– yatu
Nov 28 '18 at 21:02
Unfortunately it's not working: 1st i received this error
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
, so I think I solved using pd.to_datetime ()
(but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x)
doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx– GioCo
Nov 29 '18 at 16:36
Unfortunately it's not working: 1st i received this error
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
, so I think I solved using pd.to_datetime ()
(but this problem didn't arise with the first solution). 2nd it seems function apply_by_day(x)
doesn't work at all since the output file i exported is identical to the input file: no column n_people, nor data grouped by time.. any idea? tnx– GioCo
Nov 29 '18 at 16:36
It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.
– yatu
Nov 29 '18 at 16:36
It works for me @GioCo , make sure you are using the latest version, I added a change earlier today. I just double checked and it works with this example.
– yatu
Nov 29 '18 at 16:36
1
1
Sorry guys it was totally my fault... I forgot to set this assignment
dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter')
. Sorry to have bugged you with my last comments, and thanks again for your help– GioCo
Dec 3 '18 at 11:01
Sorry guys it was totally my fault... I forgot to set this assignment
dfOut = df.groupby(pd.Grouper(freq='D')).apply(lambda x: apply_by_day(x))
and then just dfOut.to_excel('Output.xlsx', engine='xlsxwriter')
. Sorry to have bugged you with my last comments, and thanks again for your help– GioCo
Dec 3 '18 at 11:01
|
show 2 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53502036%2fcounting-amount-of-people-in-building-over-time%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please share the actual dataframe instead of an image
– yatu
Nov 27 '18 at 14:36
1
The solution can be simple if your DataFrame is clean. You'd just need to sort by time, map in and out to +1 and -1 and then cumsum. But if you have issues where people leave the building before they enter, or enter multiple times before they leave then it's trickier. Do you have any of those odd cases that you need to protect against?
– ALollz
Nov 27 '18 at 14:50