How to make your ranking values show integer (with

2019-08-21 11:41发布

问题:

I have to rank a data set ordered by several variables in this data set and grouped by another variable of it. When I use ranking methods on a data.table, the ranking values are decimals. I'd need them to be integer numbers without decimal part.

Bellow, I´m providing a summary of what I need. I´m copying somebody else's example from another question in this website (and also related to ranking methods). I found the answer to that question useful, but it still doesn't provide the way to make the ranking outcome an integer number without decimals. That's why I'm copying it here and taking it as the starting point for this question (as it is not allowed to ask different questions under an answer).

I need to rank based upon several variables, grouped by one (or several variables), and then get an integer ranking without decimals.

Here's this other person's example:

He creates the data table:

library(data.table)

t1 <- data.table (id = c('11', '11', '11', '22','22',
                         '88', '99','44','44', '55'),
                          date = as.Date(c("01-01-2016", 
                                "01-02-2016", 
                                "01-02-2016",
                                "02-01-2016", 
                                "02-02-2016"),
                              format = "%m-%d-%Y"))


setkey(dt1, date)
setkey(dt1, id)
dt1
    id       date

1: 11 2016-01-01

2: 11 2016-01-02

3: 11 2016-01-02

4: 22 2016-02-01

5: 22 2016-02-02

6: 44 2016-01-02

7: 44 2016-02-01

8: 55 2016-02-02

9: 88 2016-01-01

10: 99 2016-01-02

And here he ranks based on the variable date and grouped by id:

dt1[, rank := frank(date), by = list(id)]
dt1

    id       date  rank
1: 11 2016-01-01   1.0
2: 11 2016-01-02   2.5
3: 11 2016-01-02   2.5
4: 22 2016-02-01   1.0
5: 22 2016-02-02   2.0
6: 44 2016-01-02   1.0 
7: 44 2016-02-01   2.0
8: 55 2016-02-02   1.0
9: 88 2016-01-01   1.0
10: 99 2016-01-02   1.0

Results should only be like this:

    id       date  rank
 1: 11 2016-01-01   1
 2: 11 2016-01-02   2
 3: 11 2016-01-02   2
 4: 22 2016-02-01   1
 5: 22 2016-02-02   2
 6: 44 2016-01-02   1
 7: 44 2016-02-01   2
 8: 55 2016-02-02   1
 9: 88 2016-01-01   1
10: 99 2016-01-02   1

回答1:

you can specify how you want to handle ties in frank. There is an argument ties.method which defaults to average which results in decimal ranks. See ?frank for details.

You could e.g. set

dt1[, rank := frank(date, ties.method = "min"), by = list(id)]

to get integer ranks.