subsetting a dataframe in R - unexpected results

2020-04-30 18:46发布

问题:

OK, couldn't find a better title

Let's say I have my_dataframe:

Name Value1 Value2
AA    10     20
BB    15     30

if I do: nrow(my_dataframe[my_dataframe$Value2>20,] I get '1' as result

I want to create my_second_dataframe, such as there's only column 'Value2':

my_second_dataframe<- my_dataframe[,'Value2', drop=FALSE]

let me check it out:

class(my_second_dataframe)
[1] "data.frame"
class(my_second_dataframe$Value2)
[1] "numeric"

but then:

nrow(my_second_dataframe[my_second_dataframe$Value2>20,]
NULL

????? This would be part of a function, in which I want to isolate a column of choice and also get number of rows of that column based on a threshold number. What am I doing wrong?

Thanks

回答1:

Based on the documentation in ?Extract

drop : For matrices and arrays. If TRUE the result is coerced to the lowest possible dimension (see the examples). This only works for extracting elements, not for the replacement. See drop for further details.

Also, by default it is drop = TRUE for [

x[i, j, ... , drop = TRUE]

So, we need to specify drop = FALSE to avoid coercing to lowest possible dimension when there is only a single column or row.

In the OP's example

my_second_dataframe[my_second_dataframe$Value2>20,, drop=FALSE]