top_n function returning more rows than expected

2020-04-12 09:36发布

问题:

I'm pretty new to r (and pretty tired- I imagine my brain is just not currently working properly) but to me the below code should only return 10 rows- it returns 66. Why is this?

library(dplyr)

a <- structure(list(calls_in_range = c(17, 14, 6, 4, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), mid_point = c(-20680, 
-20660, -20640, -20700, -36900, -36500, -36380, -36020, -35960, 
-35260, -35140, -34640, -33060, -32600, -30920, -29340, -29100, 
-28780, -27980, -27640, -27220, -27160, -26980, -26800, -26740, 
-26500, -25640, -25540, -24840, -24820, -24800, -24380, -23880, 
-23820, -23720, -23320, -22220, -21920, -21860, -21760, -21060, 
-20240, -19780, -18700, -18500, -17500, -16740, -16500, -14260, 
-14200, -14120, -13860, -13420, -13120, -12780, -12740, -12460, 
-12420, -12280, -12260, -11720, -10660, -10060, -9960, -6380, 
-5520), lower_range = c(-20690, -20670, -20650, -20710, -36910, 
-36510, -36390, -36030, -35970, -35270, -35150, -34650, -33070, 
-32610, -30930, -29350, -29110, -28790, -27990, -27650, -27230, 
-27170, -26990, -26810, -26750, -26510, -25650, -25550, -24850, 
-24830, -24810, -24390, -23890, -23830, -23730, -23330, -22230, 
-21930, -21870, -21770, -21070, -20250, -19790, -18710, -18510, 
-17510, -16750, -16510, -14270, -14210, -14130, -13870, -13430, 
-13130, -12790, -12750, -12470, -12430, -12290, -12270, -11730, 
-10670, -10070, -9970, -6390, -5530), upper_range = c(-20670, 
-20650, -20630, -20690, -36890, -36490, -36370, -36010, -35950, 
-35250, -35130, -34630, -33050, -32590, -30910, -29330, -29090, 
-28770, -27970, -27630, -27210, -27150, -26970, -26790, -26730, 
-26490, -25630, -25530, -24830, -24810, -24790, -24370, -23870, 
-23810, -23710, -23310, -22210, -21910, -21850, -21750, -21050, 
-20230, -19770, -18690, -18490, -17490, -16730, -16490, -14250, 
-14190, -14110, -13850, -13410, -13110, -12770, -12730, -12450, 
-12410, -12270, -12250, -11710, -10650, -10050, -9950, -6370, 
-5510)), class = "data.frame", row.names = c(NA, -66L), .Names = c("calls_in_range", 
"mid_point", "lower_range", "upper_range"))

top_n(a, 10, calls_in_range)

回答1:

If you inspect the calls_in_range column, you can see there are ties. This is the variable used for ordering. According to the documentation for the n argument in the top_n function:

number of rows to return. If x is grouped, this is the number of rows per group. Will include more than n rows if there are ties. If n is positive, selects the top n rows. If negative, selects the bottom n rows.

This is why it is returning more rows than expected.



标签: r dplyr