Sample() in R returning non-random sample after po

2019-01-29 00:00发布

问题:

This question already has an answer here:

  • Histogram of uniform distribution not plotted correctly in R 3 answers

The following code will return a perfectly sound sample:

b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12), 100000, replace=TRUE)
hist(b)

Increasing the number for elements by 1 to 14 will result into this:

b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13), 100000, replace=TRUE)
hist(b)

That's clearly not correct. Zero occurs more often that it should. Is there a reason for this?

回答1:

The problem lies in hist, not in sample.

You can check that doing:

> table(sample(0:15, 10000, replace=T))

  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15 
634 642 664 654 628 598 633 642 647 625 587 577 618 645 615 591 

From the hist help:

If right = TRUE (default), the histogram cells are intervals of the form (a, b], i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when include.lowest is TRUE.

For right = FALSE, the intervals are of the form [a, b), and include.lowest means ‘include highest’.

If you try

hist(sample(0:15, 10000, replace=T), br=-1:15)

the results will look correct