Find social network components in Stata

2019-07-23 16:10发布

问题:

[I copied part of the below example from a separate post and changed it to suit my specific needs]

pos_1    pos_2
  2        4
  2        5
  1        2
  3        9
  4        2
  9        3

The above is read as person_2 is connected to person_4,...,person_4 is connected to person_2, and person_9 is connected to person_3.

I want to create a third categorical [edited] variable, component, that lets me know if the observed link is part of a connected component (subnetwork) within this network. In this case, there are two connected components in the network:

pos_1    pos_2    component
  2        4        1
  2        5        1
  1        2        1
  3        9        2
  4        2        1
  9        3        2

All nodes in component 1 are connected to each other, but not to the nodes in component 2 and vice versa. Is there a way to generate this component variable in Stata? I know there are alternative programs to do this in, but my code would be more seamless if I can integrate it into Stata.

回答1:

If you reshape the data to long form, you can use group_id (from SSC) to get what you want:

clear
input pos_1    pos_2
  2        4
  2        5
  1        2
  3        9
  4        2
  9        3
end

gen id = _n
reshape long pos_, i(id) j(n)

clonevar comp = id
list, sepby(comp)

group_id comp, match(pos)

reshape wide pos_, i(id) j(n)

egen component = group(comp)
list