From edge or arc list to clusters in Stata

2019-09-07 11:55发布

问题:

I have a Stata dataset that represents connections between users that looks like this:

src_user linked_user
1         2  
2         3                 
3         5
1         4
6         7            

I would like to get something like this:

user cluster  
1     1
2     1
3     1
4     1      
5     1
6     2
7     2

where isid user evaluates to TRUE and I have grouped all users into disjoint clusters. I have tried thinking of this as a reshape problem, but without much success. None of the user-written SNA commands seem to accomplish this as far as I can tell.

What is the most efficient way of doing it with Stata, other than looping, which I am eager to avoid ?

回答1:

If you reshape the data to long form, you can use group_id (from SSC) to get what you want.

clear
input user1 user2
1         2  
2         3                 
3         5
1         4
6         7
end

gen id = _n
reshape long user, i(id) j(n)

clonevar cluster = id
list, sepby(cluster)

group_id cluster, match(user)

bysort cluster user (id): keep if _n == 1
list, sepby(cluster)