Randomize matrix elements between two values while

2019-05-10 01:22发布

问题:

I have a bit of a technical issue, but I feel like it should be possible with MATLAB's powerful toolset.

What I have is a random n by n matrix of 0's and w's, say generated with

A=w*(rand(n,n)<p);

A typical value of w would be 3000, but that should not matter too much.

Now, this matrix has two important quantities, the vectors

c = sum(A,1); 
r = sum(A,2)';

These are two row vectors, the first denotes the sum of each column and the second the sum of each row.

What I want to do next is randomize each value of w, for example between 0.5 and 2. This I would do as

rand_M = (0.5-2).*rand(n,n) + 0.5
A_rand = rand_M.*A;

However, I don't want to just pick these random numbers: I want them to be such that for every column and row, the sums are still equal to the elements of c and r. So to clean up the notation a bit, say we define

A_rand_c = sum(A_rand,1);
A_rand_r = sum(A_rand,2)';

I want that for all j = 1:n, A_rand_c(j) = c(j) and A_rand_r(j) = r(j).

What I'm looking for is a way to redraw the elements of rand_M in a sort of algorithmic fashion I suppose, so that these demands are finally satisfied.

Now of course, unless I have infinite amounts of time this might not really happen. I therefore accept these quantities to fall into a specific range: A_rand_c(j) has to be an element of [(1-e)*c(j),(1+e)*c(j)] and A_rand_r(j) of [(1-e)*r(j),(1+e)*r(j)]. This e I define beforehand, say like 0.001 or something.

Would anyone be able to help me in the process of finding a way to do this? I've tried an approach where I just randomly repick the numbers, but this really isn't getting me anywhere. It does not have to be crazy efficient either, I just need it to work in finite time for networks of size, say, n = 50.

To be clear, the final output is the matrix A_rand that satisfies these constraints.

Edit: Alright, so after thinking a bit I suppose it might be doable with some while statement, that goes through every element of the matrix. The difficult part is that there are four possibilities: if you are in a specific element A_rand(i,j), it could be that A_rand_c(j) and A_rand_r(i) are both too small, both too large, or opposite. The first two cases are good, because then you can just redraw the random number until it is smaller than the current value and improve the situation. But the other two cases are problematic, as you will improve one situation but not the other. I guess it would have to look at which criteria is less satisfied, so that it tries to fix the one that is worse. But this is not trivial I would say..

回答1:

You can take advantage of the fact that rows/columns with a single non-zero entry in A automatically give you results for that same entry in A_rand. If A(2,5) = w and it is the only non-zero entry in its column, then A_rand(2,5) = w as well. What else could it be?

You can alternate between finding these single-entry rows/cols, and assigning random numbers to entries where the value doesn't matter.

Here's a skeleton for the process:

  • A_rand=zeros(size(A)) is the matrix you are going to fill
  • entries_left = A>0 is a binary matrix showing which entries in A_rand you still need to fill
  • col_totals=sum(A,1) is the amount you still need to add in every column of A_rand
  • row_totals=sum(A,2) is the amount you still need to add in every row of A_rand


while sum( entries_left(:) ) > 0

% STEP 1:
% function to fill entries in A_rand if entries_left has rows/cols with one nonzero entry
% you will need to keep looping over this function until nothing changes
% update() A_rand, entries_left, row_totals, col_totals every time you loop

% STEP 2:
% let (i,j) be the indeces of the next non-zero entry in entries_left
% assign a random number to A_rand(i,j) <= col_totals(j) and <= row_totals(i)
% update() A_rand, entries_left, row_totals, col_totals

end

update()
    A_rand(i,j) = random_value;
    entries_left(i,j) = 0;
    col_totals(j) = col_totals(j) - random_value;
    row_totals(i) = row_totals(i) - random_value;
end

Picking the range for random_value might be a little tricky. The best I can think of is to draw it from a relatively narrow distribution centered around N*w*p where p is the probability of an entry in A being nonzero (this would be the average value of row/column totals).

This doesn't scale well to large matrices as it will grow with n^2 complexity. I tested it for a 200 by 200 matrix and it worked in about 20 seconds.