In MATLAB, I would like to generate n
pairs of random integers in the range [1, m]
, where each pair is unique. For uniqueness, I consider the order of the numbers in the pair to be irrelevant such that [3, 10]
is equal to [10, 3]
.
Also, each pair should consist of two distinct integers; i.e. [3, 4]
is ok but [3, 3]
would be rejected.
EDIT: Each possible pair should be chosen with equal likelihood.
(Obviously a constraint on the parameters is that n <= m(m-1)/2
.)
I have been able to successfully do this when m
is small, like so:
m = 500; n = 10; % setting parameters
A = ((1:m)'*ones(1, m)); % each column has the numbers 1 -> m
idxs1 = squareform(tril(A', -1))';
idxs2 = squareform(tril(A, -1))';
all_pairs = [idxs1, idxs2]; % this contains all possible pairs
idx_to_use = randperm( size(all_pairs, 1), n ); % choosing random n pairs
pairs = all_pairs(idx_to_use, :)
pairs =
254 414
247 334
111 146
207 297
45 390
229 411
9 16
75 395
12 338
25 442
However, the matrix A
is of size m x m
, meaning when m
becomes large (e.g. upwards of 10,000), MATLAB runs out of memory.
I considered generating a load of random numbers randi(m, [n, 2])
, and repeatedly rejecting the rows which repeated, but I was concerned about getting stuck in a loop when n
was close to m(m-1)/2
.
Is there an easier, cleaner way of generating unique pairs of distinct integers?
Easy, peasy, when viewed in the proper way.
You wish to generate n pairs of integers, [p,q], such that p and q lie in the interval [1,m], and p
How many possible pairs are there? The total number of pairs is just m*(m-1)/2. (I.e., the sum of the numbers from 1 to m-1.)
So we could generate n random integers in the range [1,m*(m-1)/2]. Randperm does this nicely. (Older matlab releases do not allow the second argument to randperm.)
k = randperm(m/2*(m-1),n);
(Note that I've written this expression with m in a funny way, dividing by 2 in perhaps a strange place. This avoids precision problems for some values of m near the upper limits.)
Now, if we associate each possible pair [p,q] with one of the integers in k, we can work backwards, from the integers generated in k, to a pair [p,q]. Thus the first few pairs in that list are:
{[1,2], [1,3], [2,3], [1,4], [2,4], [3,4], ..., [m-1,m]}
We can think of them as the elements in a strictly upper triangular array of size m by m, thus those elements above the main diagonal.
q = floor(sqrt(8*(k-1) + 1)/2 + 1/2);
p = k - q.*(q-1)/2;
See that these formulas recover p and q from the unrolled elements in k. We can convince ourselves that this does indeed work, but perhaps a simple way here is just this test:
k = 1:21;
q = floor(sqrt(8*(k-1) + 1)/2 + 3/2);
p = k - (q-1).*(q-2)/2;
[k;p;q]'
ans =
1 1 2
2 1 3
3 2 3
4 1 4
5 2 4
6 3 4
7 1 5
8 2 5
9 3 5
10 4 5
11 1 6
12 2 6
13 3 6
14 4 6
15 5 6
16 1 7
17 2 7
18 3 7
19 4 7
20 5 7
21 6 7
Another way of testing it is to show that all pairs get generated for a small case.
m = 5;
n = 10;
k = randperm(m/2*(m-1),n);
q = floor(sqrt(8*(k-1) + 1)/2 + 3/2);
p = k - (q-1).*(q-2)/2;
sortrows([p;q]',[2 1])
ans =
1 2
1 3
2 3
1 4
2 4
3 4
1 5
2 5
3 5
4 5
Yup, it looks like everything works perfectly. Now try it for some large numbers for m and n to test the time used.
tic
m = 1e6;
n = 100000;
k = randperm(m/2*(m-1),n);
q = floor(sqrt(8*(k-1) + 1)/2 + 3/2);
p = k - (q-1).*(q-2)/2;
toc
Elapsed time is 0.014689 seconds.
This scheme will work for m as large as roughly 1e8, before it fails due to precision errors in double precision. The exact limit should be m no larger than 134217728 before m/2*(m-1) exceeds 2^53. A nice feature is that no rejection for repeat pairs need be done.
This is more of a general approach rather then a matlab solution.
How about you do the following first you fill a vector like the following.
x[n] = rand()
x[n + 1] = x[n] + rand() %% where rand can be equal to 0.
Then you do the following again
x[n][y] = x[n][y] + rand() + 1
And if
x[n] == x[n+1]
You would make sure that the same pair is not already selected.
After you are done you can run a permutation algorithm on the matrix if you want them to be randomly spaced.
This approach will give you all the possibility or 2 integer pairs, and it runs in O(n) where n is the height of the matrix.
The following code does what you need:
n = 10000;
m = 500;
my_list = unique(sort(round(rand(n,2)*m),2),'rows');
my_list = my_list(find((my_list(:,1)==my_list(:,2))==0),:);
%temp = my_list; %In case you want to check what you initially generated.
while(size(my_list,1)~=n)
%my_list = unique([my_list;sort(round(rand(1,2)*m),2)],'rows');
%Changed as per @jucestain's suggestion.
my_list = unique([my_list;sort(round(rand((n-size(my_list,1)),2)*m),2)],'rows');
my_list = my_list(find((my_list(:,1)==my_list(:,2))==0),:);
end