subset() of a vector in R

I've written the following function based on subset(), which I find handy:

ss <- function (x, subset, ...) 
{
    r <- eval(substitute(subset), data.frame(.=x), parent.frame())
    if (!is.logical(r)) 
        stop("'subset' must be logical")
    x[r & !is.na(r)]
}

So, I can write:

ss(myDataFrame$MyVariableName, 500 < . & . < 1500)

instead of

myDataFrame$MyVariableName[ 500 < myDataFrame$MyVariableName 
                                & myDataFrame$MyVariableName < 1500]

This seems like something other people might have developed solutions for, though - including something in core R I might have missed. Anything already out there?

标签： r subset

2条回答

看我几分像从前

2楼-- · 2019-03-16 16:42

I realize that the solution Ken offers is more general than just selecting items within ranges (since it should work on any logical expression) but this did remind me that Greg Snow has comparison infix operators in his Teaching Demos package:

library(TeachingDemos)
x0 <- rnorm(100)
x0[ 0 %<% x0 %<% 1.5 ]

0人赞添加讨论(0) 举报

成全新的幸福

3楼-- · 2019-03-16 16:44

Thanks for sharing Ken.

You could use:

x <- myDataFrame$MyVariableName; x[x > 100 & x < 180]

Yours may require less typing but the code is less generalizable to others if you're sharing code. I have a few time saver functions like that myself but use them sparingly because they may be slowing down your code (extra steps) and requires you to also include that code for that function when ever you share the file with someone else.

Compare writing length. Almost the same length:

ss(mtcars$hp, 100 < . & . < 180)
x <- mtcars$hp; x[x > 100 & x < 180]

Compare time on 1000 replications.

library(rbenchmark)
benchmark(
       tyler = x[x > 100 & x < 180],
       ken = ss(mtcars$hp, 100 <. & . < 180),
 replications=1000)

   test replications elapsed relative user.self sys.self user.child sys.child
2   ken         1000    0.56 18.66667      0.36     0.03         NA        NA
1 tyler         1000    0.03  1.00000      0.03     0.00         NA        NA

So I guess it depends on if you need speed and/or sharability vs convenience. If it's just for you on a small data set I'd say it's valuable.

EDIT: NEW BENCHMARKING

> benchmark(
+     tyler = {x <- mtcars$hp; x[x > 100 & x < 180]}, 
+     ken = ss(mtcars$hp, 100 <. & . < 180), 
+     ken2 = ss2(mtcars$hp, 100 <. & . < 180),
+     joran = with(mtcars,hp[hp>100 & hp< 180 ]), 
+  replications=10000)

   test replications elapsed  relative user.self sys.self user.child sys.child
4 joran        10000    0.83  2.677419      0.69     0.00         NA        NA
2   ken        10000    3.79 12.225806      3.45     0.02         NA        NA
3  ken2        10000    0.67  2.161290      0.35     0.00         NA        NA
1 tyler        10000    0.31  1.000000      0.20     0.00         NA        NA

0人赞添加讨论(0) 举报

subset() of a vector in R

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间