可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm trying to test whether all elements of a vector are equal to one another. The solutions I have come up with seem somewhat roundabout, both involving checking length()
.
x <- c(1, 2, 3, 4, 5, 6, 1) # FALSE
y <- rep(2, times = 7) # TRUE
With unique()
:
length(unique(x)) == 1
length(unique(y)) == 1
With rle()
:
length(rle(x)$values) == 1
length(rle(y)$values) == 1
A solution that would let me include a tolerance value for assessing 'equality' among elements would be ideal to avoid FAQ 7.31 issues.
Is there a built-in function for type of test that I have completely overlooked? identical()
and all.equal()
compare two R objects, so they won't work here.
Edit 1
Here are some benchmarking results. Using the code:
library(rbenchmark)
John <- function() all( abs(x - mean(x)) < .Machine$double.eps ^ 0.5 )
DWin <- function() {diff(range(x)) < .Machine$double.eps ^ 0.5}
zero_range <- function() {
if (length(x) == 1) return(TRUE)
x <- range(x) / mean(x)
isTRUE(all.equal(x[1], x[2], tolerance = .Machine$double.eps ^ 0.5))
}
x <- runif(500000);
benchmark(John(), DWin(), zero_range(),
columns=c("test", "replications", "elapsed", "relative"),
order="relative", replications = 10000)
With the results:
test replications elapsed relative
2 DWin() 10000 109.415 1.000000
3 zero_range() 10000 126.912 1.159914
1 John() 10000 208.463 1.905251
So it looks like diff(range(x)) < .Machine$double.eps ^ 0.5
is fastest.
回答1:
I use this method, which compares the min and the max, after dividing by the mean:
# Determine if range of vector is FP 0.
zero_range <- function(x, tol = .Machine$double.eps ^ 0.5) {
if (length(x) == 1) return(TRUE)
x <- range(x) / mean(x)
isTRUE(all.equal(x[1], x[2], tolerance = tol))
}
If you were using this more seriously, you'd probably want to remove missing values before computing the range and mean.
回答2:
If they're all numeric values then if tol is your tolerance then...
all( abs(y - mean(y)) < tol )
is the solution to your problem.
EDIT:
After looking at this, and other answers, and benchmarking a few things the following comes out over twice as fast as the DWin answer.
abs(max(x) - min(x)) < tol
This is a bit surprisingly faster than diff(range(x))
since diff
shouldn't be much different than -
and abs
with two numbers. Requesting the range should optimize getting the minimum and maximum. Both diff
and range
are primitive functions. But the timing doesn't lie.
回答3:
Why not simply using the variance:
var(x) == 0
If all the elements of x
are equal, you will get a variance of 0
.
回答4:
> isTRUE(all.equal( max(y) ,min(y)) )
[1] TRUE
> isTRUE(all.equal( max(x) ,min(x)) )
[1] FALSE
Another along the same lines:
> diff(range(x)) < .Machine$double.eps ^ 0.5
[1] FALSE
> diff(range(y)) < .Machine$double.eps ^ 0.5
[1] TRUE
回答5:
You can use identical()
and all.equal()
by comparing the first element to all others, effectively sweeping the comparison across:
R> compare <- function(v) all(sapply( as.list(v[-1]),
+ FUN=function(z) {identical(z, v[1])}))
R> compare(x)
[1] FALSE
R> compare(y)
[1] TRUE
R>
That way you can add any epsilon to identical()
as needed.
回答6:
Since I keep coming back to this question over and over, here's an Rcpp
solution that will generally be much much faster than any of the R
solutions if the answer is actually FALSE
(because it will stop the moment it encounters a mismatch) and will have the same speed as the fastest R solution if the answer is TRUE
. For example for the OP benchmark, system.time
clocks in at exactly 0 using this function.
library(inline)
library(Rcpp)
fast_equal = cxxfunction(signature(x = 'numeric', y = 'numeric'), '
NumericVector var(x);
double precision = as<double>(y);
for (int i = 0, size = var.size(); i < size; ++i) {
if (var[i] - var[0] > precision || var[0] - var[i] > precision)
return Rcpp::wrap(false);
}
return Rcpp::wrap(true);
', plugin = 'Rcpp')
fast_equal(c(1,2,3), 0.1)
#[1] FALSE
fast_equal(c(1,2,3), 2)
#[2] TRUE
回答7:
I wrote a function specifically for this, which can check not only elements in a vector, but also capable of checking if all elements in a list are identical. Of course it as well handle character vectors and all other types of vector well. It also has appropriate error handling.
all_identical <- function(x) {
if (length(x) == 1L) {
warning("'x' has a length of only 1")
return(TRUE)
} else if (length(x) == 0L) {
warning("'x' has a length of 0")
return(logical(0))
} else {
TF <- vapply(1:(length(x)-1),
function(n) identical(x[[n]], x[[n+1]]),
logical(1))
if (all(TF)) TRUE else FALSE
}
}
Now try some examples.
x <- c(1, 1, 1, NA, 1, 1, 1)
all_identical(x) ## Return FALSE
all_identical(x[-4]) ## Return TRUE
y <- list(fac1 = factor(c("A", "B")),
fac2 = factor(c("A", "B"), levels = c("B", "A"))
)
all_identical(y) ## Return FALSE as fac1 and fac2 have different level order
回答8:
You do not actually need to use min, mean, or max.
Based on John's answer:
all(abs(x - x[[1]]) < tolerance)
回答9:
Here an alternative using the min, max trick but for a data frame. In the example I am comparing columns but the margin parameter from apply
can be changed to 1 for rows.
valid = sum(!apply(your_dataframe, 2, function(x) diff(c(min(x), max(x)))) == 0)
If valid == 0
then all the elements are the same