可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Goal
I am trying to build a function that draws a specific number of random numbers from an an "incomplete uniform distribution".
What do I call an incomplete uniform distribution?
I call an incomplete uniform distribution a probability distribution where each value of X
within a series of boundaries have an equal probability to be picked. In other words it is a uniform distribution with holes (where the probability is zero) as represented below
x = list(12:25, 34:54, 67:90, 93:115)
y = 1/sum(25-12, 54-34, 90-67, 115-93)
plot(y=rep(y, length(unlist(x))), x=unlist(x), type="n", ylab="Probability", xlab="X")
for (xi in x)
{
points(xi,rep(y, length(xi)), type="l", lwd=4)
}
Ugly Solution
Here is a slow and ugly solution
IncompleteUnif = function(n,b)
{
#################
# "n" is the desired number of random numbers
# "b" is a list describing the boundaries within which a random number can possibly be drawn.
#################
r = c() # Series of random numbers to return
for (ni in n)
{
while (length(r) < n) # loop will continue until we have the "n" random numbers we need
{
ub = unlist(b)
x = runif(1,min(ub), max(ub)) # one random number taken over the whole range
for (bi in b) # The following loop test if the random number is withinn the ranges specified by "b"
{
if (min(bi) < x & max(bi) > x) # if found in one range then just add "x" to "r" and break
{
r = append(r,x)
break
}
}
}
}
return (r)
}
b = list(c(5,94),c(100,198),c(220,292), c(300,350))
set.seed(12)
IncompleteUnif(10,b)
[1] 28.929516 287.132444 330.204498 63.425103 16.693990 66.680826 226.374551 12.892821 7.872065 140.480533
回答1:
Slightly convoluted version of the @Gregor's solution.
mix_unif <- function(n, b){
x <- c()
ns <- rmultinom(1, n, sapply(b, diff))
for (i in seq_along(ns)) {
x <- c(x, runif(ns[i], b[[i]][1], b[[i]][2]))
}
x
}
microbenchmark(mix_unif(1e5, b),
rmixunif(1e5, b),
IncompleteUnif(1e5, b),
unit="relative")
Unit: relative
expr min lq mean median uq max neval
mix_unif(1e+05, b) 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 100
rmixunif(1e+05, b) 3.123515 3.235961 3.750369 3.496843 3.462529 15.73449 100
IncompleteUnif(1e+05, b) 6.806916 7.247425 6.926282 7.188556 7.093928 18.20041 100
回答2:
Your incomplete uniform distribution can be represented as a mixture of four ordinary uniform distributions, with mixing weight for each segment proportional to its length (i.e., so that the longer the segment, the more weight it has).
To sample from such a distribution, first choose a segment (taking the weights into account). Then from the chosen segment, choose an element.
回答3:
I believe this works, using the algorithm suggested by Robert Dodier:
rmixunif = function(n, b) {
subdists = sample(seq_along(b), size = n, replace = T, prob = sapply(b, diff))
subdists_n = tabulate(subdists)
draw = numeric(n)
for (i in unique(subdists)) {
draw[subdists == i] = runif(subdists_n[i], min = b[[i]][1], max = b[[i]][2])
}
return(draw)
}
rmixunif(10, b = list(c(5,94),c(100,198),c(220,292), c(300,350)))
# [1] 64.85989 85.33292 235.39607 233.40133 240.28686 67.21626 237.60248 11.80377 151.65365 306.44473
I like Sam Dickson's histogram visual check, here's my version:
x <- rmixunif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)
It could be fancied up with some input-checking (and maybe an mapply
like suggested in the comments), but I'll leave that to others.
Thanks to alexis_iaz for the tabulate()
suggestion!
回答4:
Another solution is to transform the output. The idea is to sample from a random uniform distribution, then apply a conditional transformation so that the numbers fall only in the selected ranges:
IncompleteUnif = function(n,b) {
widths <- cumsum(sapply(b,diff))
x <- runif(n,0,tail(widths,1))
out <- x
out[x<=widths[1]] <- x[x<=widths[1]] + b[[1]][1]
for(i in 2:length(b)) {
out[widths[i-1]<x & x<=widths[i]] <- x[widths[i-1]<x & x<=widths[i]] - widths[i-1] + b[[i]][1]
}
return(out)
}
x <- IncompleteUnif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)
回答5:
I'm a few years late to the party, but seeing that there wasn't a solution without an explicit loop yet, here's one such implementation (following @RobertDodier's approach):
rmunif <- function(n, b) {
runifb <- function(n, b) runif(n, b[1], b[2])
ns <- rmultinom(1, n, vapply(b, diff, 1))
unlist(Map(runifb, ns, b), use.names = FALSE)
}
hist(rmunif(1e5, list(0:1, c(5, 8), 9:10)))
library(microbenchmark)
set.seed(2018)
n <- 1e5
microbenchmark(
rmunif(n, b),
mix_unif(n, b),
rmixunif(n, b),
IncompleteUnif(n, b),
unit = "relative"
) -> mb
print(mb, signif = 5)
#> Unit: relative
#> expr min lq mean median uq max neval
#> rmunif(n, b) 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100
#> mix_unif(n, b) 1.1181 1.1256 1.1281 1.1728 1.1236 1.0476 100
#> rmixunif(n, b) 2.7822 2.8982 2.9899 2.7850 2.8345 1.3970 100
#> IncompleteUnif(n, b) 4.4922 4.7089 5.2732 4.5764 8.4317 2.4364 100
Created on 2018-03-11 by the reprex package (v0.2.0).