This question already has an answer here:
How can we generate unique id numbers within each group of a dataframe? Here's some data grouped by "personid":
personid date measurement
1 x 23
1 x 32
2 y 21
3 x 23
3 z 23
3 y 23
I wish to add an id column with a unique value for each row within each subset defined by "personid", always starting with 1
. This is my desired output:
personid date measurement id
1 x 23 1
1 x 32 2
2 y 21 1
3 x 23 1
3 z 23 2
3 y 23 3
I appreciate any help.
The misleadingly named
ave()
function, with argumentFUN=seq_along
, will accomplish this nicely -- even if yourpersonid
column is not strictly ordered.Using
data.table
, and assuming you wish to order bydate
within thepersonid
subsetIf you wish do not wish to order by
date
Any of the following would also work
The equivalent commands using
plyr
Some
dplyr
alternatives, using convenience functionsrow_number
andn
.You may also use
getanID
from packagesplitstackshape
. Note that the input dataset is returned as adata.table
.Assuming your data are in a data.frame named
Data
, this will do the trick:You can use
sqldf
I think there's a canned command for this, but I can't remember it. So here's one way:
This works because
duplicated
returns a logical vector.cumsum
evalues numeric vectors, so the logical gets coerced to numeric.You can store the result to your data.frame as a new column if you want: