PROC SQL in SAS - All Pairs of Items

2020-06-17 14:42发布

问题:

I have a dataset in which I need to look at all pairs of items that are together from within another group. I've created a toy example below to further explain.

BUNCH    FRUITS
1        apples
1        bananas
1        mangos
2        apples
3        bananas
3        apples
4        bananas
4        apples

What I want is a listing of all possible pairs and sum the frequency they occur together within a bunch. My output would ideally look like this:

FRUIT1    FRUIT2     FREQUENCY
APPLES    BANANAS    3
APPLES    MANGOS     1

My end goal is to make something that I'll eventually be able to import into Gephi for a network analysis. For this I need a Source and Target column (aka FRUIT1 and FRUIT2 above).

I think there are a few other ways to approach this as well without using PROC SQL (Maybe using PROC TRANSPOSE) but this is where I've started.


SOLUTION

Thanks for the help. Sample code below for anyone interested in something similar:

proc sql;
    create table fruit_combo as
    select a.FRUIT as FRUIT1, b.FRUIT as FRUIT2, count(*) as FREQUENCY
    from FRUITS a, FRUITS b
    where a.BUNCH=b.BUNCH and and not a.FRUIT= b.FRUIT
    group by FRUIT1, FRUIT2;
    quit;

回答1:

Simplest approach is to do a cartesian (full) join of the table to itself, on t1.ID=t2.ID and t1.FRUIT ne t2.FRUIT. That will generate the full combination set, which you could then summarize.



回答2:

Here's the copy/paste version of above. A simple reading shows errors - duplicate rows of counts for banana-apple and apple-banana. To get to the desired result an additional restriction was required (a.FRUIT gt b.FRUIT).

data FRUITS ; 
input  BUNCH    FRUIT $;
cards;
1        apples
1        bananas
1        mangos
2        apples
3        bananas
3        apples
4        bananas
4        apples
;
run;


proc freq data=have ;
tables fruits; 
run;


proc sql;
    create table fruit_combo as
    select a.FRUIT as FRUIT1, b.FRUIT as FRUIT2, count(*) as FREQUENCY
    from FRUITS a, FRUITS b
    where a.BUNCH=b.BUNCH 
     and a.FRUIT ne b.FRUIT
     and a.FRUIT gt b.FRUIT
    group by FRUIT1, FRUIT2;
    quit;

proc print ; run;


标签: sql sas