Geomnet package - error with user-defined coordina

2019-05-04 10:02发布

问题:

I'm trying to create network plot with user-defined coordinates in geomnet package for 11 nodes. After a quick research I've discovered that it can be done by setting layout.alg to NULL and passing x and y coordinates to the plot - but it's not specified whether I should use coordinates for 'from' or for 'to'. I've tried both approaches, but there's still an error:

Error in scale_apply(layer_data, x_vars, "train", SCALE_X, x_scales) :  

Maybe I'm the one who do something wrong and it's very obvious, but, to be honest, I have no idea at all. I would be very grateful for any help with that.

My data:

# A tibble: 11 x 4
    from    to     x     y
   <chr> <chr> <chr> <chr>
 1    z1    z3    40    30
 2    z2    z4    20    30
 3    z3    z5    15    30
 4    z4    z8    60    30
 5    z5    z9    10    50
 6    z6    z4    30    60
 7    z7    z4    50    50
 8    z8    z4    65    50
 9    z9    z5    20    80
10   z10    z8    40    90
11   z11    z7    60    70

My code:

p <- ggplot(data=fromto, aes(from_id = from, to_id = to, x=x,y=y)) 
p + geom_net(directed = TRUE,
               layout.alg = NULL,
               size = 6, arrowsize = 0.5,
               curvature = 0.05,
               arrowgap = 0.02,
               linewidth = 0.5)

When I'm using smaller dataset it seems to work quite fine.

  from to  x  y
1   z1 z3 10 25
2   z2 z1 20 30
3   z3 z2 40 30

p <- ggplot(data=fromto, aes(from_id = from, to_id = to, x=x,y=y)) 

p + geom_net(directed = TRUE,
           layout.alg = NULL,
           size = 6, arrowsize = 0.5,
           curvature = 0.05,
           arrowgap = 0.02,
           linewidth = 0.5)

Outcome - simple plot

回答1:

After an accurate analysis of the geom_net code, I found that understanding how geomnet:::StatNet$compute_network works is fundamental for solving your problem.

Typing

geomnet:::StatNet$compute_network(data=fromto, layout.alg =NULL)

gives the (wrong) output:

# A tibble: 21 x 7
# Groups:   from, to [21]
     from     to     x     y  xend  yend weight
   <fctr> <fctr> <int> <int> <int> <int>  <int>
 1     z1     z3    NA    NA    NA    NA      1
 2     z1     z5    40    30    10    50      1
 3    z10     z6    40    90    30    60      1
 4    z10     z8    NA    NA    NA    NA      1
 5    z11     z7    60    70    50    50      1
 6     z2    z10    20    30    40    90      1
 7     z2     z4    NA    NA    NA    NA      1
 8     z3     z5    NA    NA    NA    NA      1
 9     z3     z9    15    30    20    80      1
10     z4     z6    60    30    30    60      1
# ... with 11 more rows

Inside geomnet:::StatNet$compute_network an important step is the construction of the edgelist matrix:

net <- network::as.network(na.omit(fromto[, 1:2]), matrix.type = "edgelist")
summary(net)
( edgelist <- sna::as.edgelist.sna(net) )

The output is:

### summary(net)
Network adjacency matrix:
    z1 z10 z11 z2 z3 z4 z5 z6 z7 z8 z9
z1   0   0   0  0  1  0  0  0  0  0  0
z10  0   0   0  0  0  0  0  0  0  1  0
z11  0   0   0  0  0  0  0  0  1  0  0
z2   0   0   0  0  0  1  0  0  0  0  0
z3   0   0   0  0  0  0  1  0  0  0  0
z4   0   0   0  0  0  0  0  0  0  1  0
z5   0   0   0  0  0  0  0  0  0  0  1
z6   0   0   0  0  0  1  0  0  0  0  0
z7   0   0   0  0  0  1  0  0  0  0  0
z8   0   0   0  0  0  1  0  0  0  0  0
z9   0   0   0  0  0  0  1  0  0  0  0

### edgelist
      [,1] [,2] [,3]
 [1,]    1    5    1
 [2,]    4    6    1
 [3,]    5    7    1
 [4,]    6   10    1
 [5,]    7   11    1
 [6,]    8    6    1
 [7,]    9    6    1
 [8,]   10    6    1
 [9,]   11    7    1
[10,]    2   10    1
[11,]    3    9    1
attr(,"n")
[1] 11
attr(,"vnames")
 [1] "z1"  "z10" "z11" "z2"  "z3"  "z4"  "z5"  "z6"  "z7"  "z8"  "z9" 

In the first row of the adjacency matrix we can see that z1 and z3 are (correctly) connected but row-column labels have been ordered by character value, not taking account that labels contain embedded numbers which (in our mind) should be numerically sorted.

The sna::as.edgelist.sna function relies on row-column positions and not on their labels, hence in the first row of the output it gives [1,] 1 5 1. This is obviously wrong.

These considerations suggest a possible solution: avoid vertix labels with embedded numbers and use (for example) only letters of the alphabet:

fromto$from_id <- LETTERS[as.numeric(gsub("z","",as.character(fromto$from_id)))]
fromto$to_id <- LETTERS[as.numeric(gsub("z","",as.character(fromto$to_id)))]
fromto

#    from_id to_id  x  y
# 1        A     C 40 30
# 2        B     D 20 30
# 3        C     E 15 30
# 4        D     H 60 30
# 5        E     I 10 50
# 6        F     D 30 60
# 7        G     D 50 50
# 8        H     D 65 50
# 9        I     E 20 80
# 10       J     H 40 90
# 11       K     G 60 70

library(geomnet)
ggplot(data=fromto, aes(from_id=from_id, to_id=to_id)) +
  geom_net(aes(x=x, y=y),layout.alg = NULL)+
  geom_text(aes(x=x, y=y, label=from_id), hjust=-1) 



标签: r ggplot2