This question is related to rostering or staffing. I'm trying to assign various jobs to individuals (employees). Using the df
below,
`[Person]` = Individuals (employees)
`[Area]` and `[Place]` = unique jobs
`[On]` = How many unique jobs are occurring at each point in time
So [Area]
and [Place]
together will make up unique
values that are different jobs. These values will be assigned to individuals with the overall aim to use the least amount of individuals possible. The most unique values assigned
to any one individual is 3. [On]
displays how many current unique
values for [Place]
and [Area]
are occurring. So this provides a concrete guide on how many individuals I need. For example,
1-3 unique values occurring = 1 individual
4-6 unique values occurring = 2 individuals
7-9 unique values occurring = 3 individuals etc
Question:
Where the amount of unique
values in [Area]
and [Place]
is greater than 3 is causing me trouble. I can't do a groupby
where I assign
the first 3 unique values
to individual 1
and the next 3 unique
values to individual 2
etc. I want to group unique values in [Area]
and [Place]
by [Area]
. So look to assign
same values in [Area]
to an individual (up to 3). Then, if there are leftover values (<3), they should be combined to make a group of 3, where possible.
The way I envisage this working is: see into the future by an hour
. For each new row
of values the script
should see how many values will be [On]
(this provides an indication of how many total individuals are required). Where unique
values are >3, they should be assigned
by grouping
the same value in [Area]
. If there are leftover values they should be combined anyhow to make up to a group of 3.
Putting that into a step by step process:
1) Use the [On]
Column
to determine how many individuals are required by looking into the future for an hour
2) Where there are more than 3 unique
values occurring assign the identical values in [Area]
first.
3) If there are any leftover values then look to combine anyway possible.
For the df
below, there are 9 unique
values occurring for [Place]
and [Area]
with an hour
. So we should have 3 individuals assigned
. When unique
values >3 it should be assigned by [Area]
and seeing if the same value occurs. The leftover values should be combined with other individuals that have less than 3 unique
values.
import pandas as pd
import numpy as np
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 5','House 1','House 2','House 3','House 2'],
'Area' : ['A','B','C','D','E','D','E','F','G'],
'On' : ['1','2','3','4','5','6','7','8','9'],
'Person' : ['Person 1','Person 2','Person 3','Person 4','Person 5','Person 4','Person 5','Person 6','Person 7'],
})
df = pd.DataFrame(data=d)
This is my attempt:
def reduce_df(df):
values = df['Area'] + df['Place']
df1 = df.loc[~values.duplicated(),:] # ignore duplicate values for this part..
person_count = df1.groupby('Person')['Person'].agg('count')
leftover_count = person_count[person_count < 3] # the 'leftovers'
# try merging pairs together
nleft = leftover_count.shape[0]
to_try = np.arange(nleft - 1)
to_merge = (leftover_count.values[to_try] +
leftover_count.values[to_try + 1]) <= 3
to_merge[1:] = to_merge[1:] & ~to_merge[:-1]
to_merge = to_try[to_merge]
merge_dict = dict(zip(leftover_count.index.values[to_merge+1],
leftover_count.index.values[to_merge]))
def change_person(p):
if p in merge_dict.keys():
return merge_dict[p]
return p
reduced_df = df.copy()
# update df with the merges you found
reduced_df['Person'] = reduced_df['Person'].apply(change_person)
return reduced_df
df1 = (reduce_df(reduce_df(df)))
This is the Output:
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 4
4 8:35:00 House 5 E 5 Person 5
5 8:40:00 House 1 D 6 Person 4
6 8:42:00 House 2 E 7 Person 5
7 8:45:00 House 3 F 8 Person 5
8 8:50:00 House 2 G 9 Person 7
This is my Intended Output:
Time Place Area On Person
0 8:03:00 House 1 A 1 Person 1
1 8:17:00 House 2 B 2 Person 1
2 8:20:00 House 3 C 3 Person 1
3 8:28:00 House 4 D 4 Person 2
4 8:35:00 House 5 E 5 Person 3
5 8:40:00 House 6 D 6 Person 2
6 8:42:00 House 2 E 7 Person 3
7 8:45:00 House 3 F 8 Person 2
8 8:50:00 House 2 G 9 Person 3
Description on how I want to get this output:
Index 0: One `unique` value occurring. So `assign` to individual 1
Index 1: Two `unique` values occurring. So `assign` to individual 1
Index 2: Three `unique` values occurring. So `assign` to individual 1
Index 3: Four `unique` values on. So `assign` to individual 2
Index 4: Five `unique` values on. This one is a bit tricky and hard to conceptualise. But there is another `E` within an `hour`. So `assign` to a new individual so it can be combined with the other `E`
Index 5: Six `unique` values on. Should be `assigned` with the other `D`. So individual 2
Index 6: Seven `unique` values on. Should be `assigned` with other `E`. So individual 3
Index 7: Eight `unique` values on. New value in `[Area]`, which is a _leftover_. `Assign` to either individual 2 or 3
Index 8: Nine `unique` values on. New value in `[Area]`, which is a _leftover_. `Assign` to either individual 2 or 3
Example No2:
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','8:40:00','8:42:00','8:45:00','8:50:00'],
'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 1','House 2','House 3'],
'Area' : ['X','X','X','X','X','X','X','X','X'],
'On' : ['1','2','3','3','3','3','3','3','3'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1'],
})
df = pd.DataFrame(data=d)
I am getting an error:
IndexError: index 1 is out of bounds for axis 1 with size 1
On this line:
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
However, if I change the Person to 1,2,3 repeating, it returns the following:
'Person' : ['Person 1','Person 2','Person 3','Person 1','Person 2','Person 3','Person 1','Person 2','Person 3'],
Time Place Area On Person
0 8:03:00 House 1 X 1 Person 1
1 8:17:00 House 2 X 2 Person 1
2 8:20:00 House 3 X 3 Person 1
3 8:28:00 House 1 X 3 Person 2
4 8:35:00 House 2 X 3 Person 2
5 8:40:00 House 3 X 3 Person 2
6 8:42:00 House 1 X 3 Person 3
7 8:45:00 House 2 X 3 Person 3
8 8:50:00 House 3 X 3 Person 3
Intended Output:
Time Place Area On Person
0 8:03:00 House 1 X 1 Person 1
1 8:17:00 House 2 X 2 Person 1
2 8:20:00 House 3 X 3 Person 1
3 8:28:00 House 1 X 3 Person 1
4 8:35:00 House 2 X 3 Person 1
5 8:40:00 House 3 X 3 Person 1
6 8:42:00 House 1 X 3 Person 1
7 8:45:00 House 2 X 3 Person 1
8 8:50:00 House 3 X 3 Person 1
The main takeaway from Example 2 is:
1) There are <3 unique values on so assign to individual 1