Draw random numbers from the mixture of uniform di

2019-07-03 22:30发布

Goal

I am trying to build a function that draws a specific number of random numbers from an an "incomplete uniform distribution".

What do I call an incomplete uniform distribution?

I call an incomplete uniform distribution a probability distribution where each value of X within a series of boundaries have an equal probability to be picked. In other words it is a uniform distribution with holes (where the probability is zero) as represented below

x = list(12:25, 34:54, 67:90, 93:115)
y = 1/sum(25-12, 54-34, 90-67, 115-93)
plot(y=rep(y, length(unlist(x))), x=unlist(x), type="n", ylab="Probability", xlab="X")
for (xi in x)
{
    points(xi,rep(y, length(xi)), type="l", lwd=4)
}

enter image description here

Ugly Solution

Here is a slow and ugly solution

IncompleteUnif = function(n,b)
{
    #################
    # "n" is the desired number of random numbers
    # "b" is a list describing the boundaries within which a random number can possibly be drawn.
    #################
    r = c() # Series of random numbers to return
    for (ni in n)
    {
        while (length(r) < n) # loop will continue until we have the "n" random numbers we need
        {
            ub = unlist(b)
            x = runif(1,min(ub), max(ub)) # one random number taken over the whole range
            for (bi in b) # The following loop test if the random number is withinn the ranges specified by "b"
            {
                if (min(bi) < x & max(bi) > x) # if found in one range then just add "x" to "r" and break
                {
                    r = append(r,x)
                    break
                }
            }
        }
    }
    return (r)
}

b = list(c(5,94),c(100,198),c(220,292), c(300,350))
set.seed(12)
IncompleteUnif(10,b)
[1]  28.929516 287.132444 330.204498  63.425103  16.693990  66.680826 226.374551  12.892821   7.872065 140.480533

5条回答
放我归山
2楼-- · 2019-07-03 22:39

Another solution is to transform the output. The idea is to sample from a random uniform distribution, then apply a conditional transformation so that the numbers fall only in the selected ranges:

IncompleteUnif = function(n,b) {
  widths <- cumsum(sapply(b,diff))
  x <- runif(n,0,tail(widths,1))
  out <- x
  out[x<=widths[1]] <- x[x<=widths[1]] + b[[1]][1]
  for(i in 2:length(b)) {
    out[widths[i-1]<x & x<=widths[i]] <- x[widths[i-1]<x & x<=widths[i]] - widths[i-1] + b[[i]][1]
  }
  return(out)
}

x <- IncompleteUnif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)

enter image description here

查看更多
祖国的老花朵
3楼-- · 2019-07-03 22:40

I'm a few years late to the party, but seeing that there wasn't a solution without an explicit loop yet, here's one such implementation (following @RobertDodier's approach):

rmunif <- function(n, b) {
  runifb <- function(n, b) runif(n, b[1], b[2])
  ns <- rmultinom(1, n, vapply(b, diff, 1))
  unlist(Map(runifb, ns, b), use.names = FALSE)
}

hist(rmunif(1e5, list(0:1, c(5, 8), 9:10)))

library(microbenchmark)

set.seed(2018)
n <- 1e5

microbenchmark(
  rmunif(n, b),
  mix_unif(n, b),
  rmixunif(n, b),
  IncompleteUnif(n, b),
  unit = "relative"
) -> mb

print(mb, signif = 5)
#> Unit: relative
#>                  expr    min     lq   mean median     uq    max neval
#>          rmunif(n, b) 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000   100
#>        mix_unif(n, b) 1.1181 1.1256 1.1281 1.1728 1.1236 1.0476   100
#>        rmixunif(n, b) 2.7822 2.8982 2.9899 2.7850 2.8345 1.3970   100
#>  IncompleteUnif(n, b) 4.4922 4.7089 5.2732 4.5764 8.4317 2.4364   100

Created on 2018-03-11 by the reprex package (v0.2.0).

查看更多
Luminary・发光体
4楼-- · 2019-07-03 22:45

I believe this works, using the algorithm suggested by Robert Dodier:

rmixunif = function(n, b) {
    subdists = sample(seq_along(b), size = n, replace = T, prob = sapply(b, diff))
    subdists_n = tabulate(subdists)
    draw = numeric(n)
    for (i in unique(subdists)) {
        draw[subdists == i] = runif(subdists_n[i], min = b[[i]][1], max = b[[i]][2])
    }
    return(draw)
}

rmixunif(10, b = list(c(5,94),c(100,198),c(220,292), c(300,350)))
#  [1]  64.85989  85.33292 235.39607 233.40133 240.28686  67.21626 237.60248  11.80377 151.65365 306.44473

I like Sam Dickson's histogram visual check, here's my version:

x <- rmixunif(10000,list(c(0,1),c(2.5,3),c(6,10)))
hist(x,breaks=20)

enter image description here

It could be fancied up with some input-checking (and maybe an mapply like suggested in the comments), but I'll leave that to others.

Thanks to alexis_iaz for the tabulate() suggestion!

查看更多
混吃等死
5楼-- · 2019-07-03 22:56

Slightly convoluted version of the @Gregor's solution.

mix_unif <- function(n, b){
  x <- c()
  ns <- rmultinom(1, n, sapply(b, diff))
  for (i in seq_along(ns)) {
    x <- c(x, runif(ns[i], b[[i]][1], b[[i]][2]))
  }
  x
}

microbenchmark(mix_unif(1e5, b),
               rmixunif(1e5, b),
               IncompleteUnif(1e5, b), 
               unit="relative")
Unit: relative
                     expr      min       lq     mean   median       uq      max neval
       mix_unif(1e+05, b) 1.000000 1.000000 1.000000 1.000000 1.000000  1.00000   100
       rmixunif(1e+05, b) 3.123515 3.235961 3.750369 3.496843 3.462529 15.73449   100
 IncompleteUnif(1e+05, b) 6.806916 7.247425 6.926282 7.188556 7.093928 18.20041   100
查看更多
闹够了就滚
6楼-- · 2019-07-03 23:00

Your incomplete uniform distribution can be represented as a mixture of four ordinary uniform distributions, with mixing weight for each segment proportional to its length (i.e., so that the longer the segment, the more weight it has).

To sample from such a distribution, first choose a segment (taking the weights into account). Then from the chosen segment, choose an element.

查看更多
登录 后发表回答