I have a dataframe of individuals who each have multiple records. I want to enumerate the record in the sequence for each individual in python. Essentially I would like to create the 'sequence' column in the following table:
patient date sequence
145 20Jun2009 1
145 24Jun2009 2
145 15Jul2009 3
582 09Feb2008 1
582 21Feb2008 2
987 14Mar2010 1
987 02May2010 2
987 12May2010 3
This is essentially the same question as here, but I am working in python and unable to implement the sql solution. I suspect I can use a groupby statement with an iterable count, but have so far been unsuccessful. Thanks!
The question is how do I sort on multiple columns of data.
One simple trick is to use the
key
parameter to the sorted function.You'll be sorting by a string built from the columns of the array.
Firstly you want to convert the date column to be a pandas datetime (rather than strings):
Note: see docs for possible format options.
If this isn't in date order (for each patient), I would sort it first:
Now you can groupby and cumcount:
Which is what you want (althout it's out of order):
To rearrange (though you may not need to) use
sort_index
(or we could reindex if we saved the initial DataFrame's index):*I stumbled upon the answer which was embarrassingly simple. The groupby statement has a 'cumcount()' option which will enumerate group items.
The caveat is that the records have to be in the order you want them enumerated.