Random sampling from data quantiles, while preserv

Following my previous question titled: "Random sampling from a dataset, while preserving original probability distribution", I want to sample from a set of >2000 numbers, gathered from measurement. I want to perform several tests (I take maximum of 10 samples in each tests), while preserving probability distribution in overall testiong process, and in each test (as much as possible). Now, instead of completely random sampling, I partition data into 5 quantiles, and in 10 tests, I sample 2 data elements from each quantile, using a uniformly random distribution for the array of data in each quantile.

The problem with the completely random sampling was that as the distribution of data is long-tailed, I was getting almost the same values in each test. I want some small value samples, some middle value samples, and some big value samples in each test. So I sampled as described.

Fig 1. Density plot of ~2k elements of data.

This is the R code for calculating quantiles:

q=quantile(data, probs = seq(0, 1, by= 0.1))

And then I partition data into 5 quantiles (each one as an array) and sample from each partition. For example, I do this in Java:

public int getRandomData(int quantile) {
    int data[][] = {1,2,3,4,5}
                  ,{6,7,8,9,10}
                  ,{11,12,13,14,15}
                  ,{16,17,18,19,20}
                  ,{21,22,23,24,25}};
    length=data[quantile][].length;
    Random r=new Random();
    int randomInt = r.nextInt(length);
    return data[quantile][randomInt];
}

So, does the samples for each tests and all tests overall, preserve the characteristics of the original distribution, for example mean and variance? If not, how to arrange sampling to achieve this goal?

标签： java r sampling probability-density

1条回答

贼婆χ

2楼-- · 2019-09-09 07:59

preserve the characteristics of the original distribution, for example mean and variance?

This will have a similar distribution. You might want to have an additional check to ensure it meets your requirement, and perhaps try again, but this will get you close.

If not, how to arrange sampling to achieve this goal?

Unless you have duplication of all data i.e. double everything, you need to have one of every sample value. This is the only way to get exactly the same distribution.

0人赞添加讨论(0) 举报

Random sampling from data quantiles, while preserv

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间