I am working in SAS trying to create a conditional probability table.
The current structure of the table is: 5 columns x 10 rows --> the value in each cell is binary. Current Data Table
col1 col2 col3 col4 col5
1 0 1 0 0
0 0 0 1 1
0 0 0 0 0
1 0 0 0 0
1 0 0 0 1
0 1 0 0 0
0 1 0 1 0
1 1 1 1 0
1 0 1 0 1
1 0 1 0 0
I would like to create a table with the conditional probability for every column vs every other column. Ideal Output
--- col1 col2 col3 col4 col5
col1 1.0 0.3 1.0 0.3 0.7
col2 0.2 1.0 0.3 0.7 0.0
col3 0.7 0.3 1.0 0.3 0.3
col4 0.2 0.7 0.3 1.0 0.3
col5 0.3 0.0 0.3 0.3 1.0
This is a much simpler version of the actual problem I am working on (100s of rows & millions of columns, so I'd ideally have a solution which could adjust based on the size of the table).
I've been working with the array and do loop, but haven't been able to get very far.
My current code looks like this (not close to complete):
data ideal_output;
set binary_table;
array obs(10,5);
array output(5,5);
do i=1 to 5;
do j=1 to 5;
do k=1 to 10;
do l=1 to 10;
output(m,n) = sum(obs(k,i)*obs(l,j))/sum(obs(k,i));
end;end;end;end;
run;
You can probably do something equivalent with a summarization proc. It will be a bit messy as you'll have to do some transposing probably and get rid of the '0' rows, but this will start you off perhaps?
Then you can use which columns of col1-col5 are populated to generate column/row names and transpose the dataset.
You have the right sort of idea - the tricky part is loading all your variables into the appropriate arrays. If your full dataset is too large to fit into memory you may need to process one subset of it at a time.