R: random sample of columns excluding one column

2019-08-05 08:10发布

问题:

I may have discovered one of the problems in the code posted previously, "R: using foreach() with sample() procedures in randomForest() call" and it relates to the script I was using to draw a random subsample of columns from a dataframe. The fake data (below) has 19 columns, "A" through "S" and I want to draw a random subset of 5 columns, but I want to exclude the third column, "C", from the draw. Simply excluding the third column from the first argument of sample() call does not work (i.e., some of the samples contain the 'C' column). I'm hoping someone has a suggestion on how to do this. This is the script that does not work:

randsCOLs= sample(1:dim(FAKEinput[,c(1:2,4:19)])[2], 5, replace=FALSE) 
#randsCOLs= sample(dim(FAKEinput[,c(1:2,4:19)])[2], 5, replace=FALSE) - also doesn't work
out <- FAKEinput[,randsCOLs]

FAKEinput <- 
data.frame(A=sample(25:75,20, replace=T), B=sample(1:2,20,replace=T), C=as.factor(sample(0:1,20,replace=T,prob=c(0.3,0.7))),
    D=sample(200:350,20,replace=T), E=sample(2300:2500,20,replace=T), F=sample(92000:105000,20,replace=T),
    G=sample(280:475,20,replace=T),H=sample(470:550,20,replace=T),I=sample(2537:2723,20,replace=T),
    J=sample(2984:4199,20,replace=T),K=sample(222:301,20,replace=T),L=sample(28:53,20,replace=T),
    M=sample(3:9,20,replace=T),N=sample(0:2,20,replace=T),O=sample(0:5,20,replace=T),P=sample(0:2,20,replace=T),
    Q=sample(0:2,20,replace=T), R=sample(0:2,20,replace=T), S=sample(0:7,20,replace=T))

回答1:

It looks like excluding the dim() call will work, if I'm not mistaken.

randsCOLs = sample(FAKEinput[-3], 5, replace=FALSE) 


回答2:

Here is a more general approach (in case the C column is not the 3rd column)

FAKEinput[sample(which(names(FAKEinput) !='C'),5, replace=FALSE)]

Or you could use setdiff

FAKEinput[sample(setdiff(names(FAKEinput),'C'), 5, replace=FALSE)]

Or by changing the OP's code of 1:dim and assuming that C is the column 3

FAKEinput[sample((1:dim(FAKEinput)[2])[-3], 5, replace=FALSE)]


标签: r sample