I'm trying to create network plot with user-defined coordinates in geomnet package for 11 nodes. After a quick research I've discovered that it can be done by setting layout.alg to NULL and passing x and y coordinates to the plot - but it's not specified whether I should use coordinates for 'from' or for 'to'. I've tried both approaches, but there's still an error:
Error in scale_apply(layer_data, x_vars, "train", SCALE_X, x_scales) :
Maybe I'm the one who do something wrong and it's very obvious, but, to be honest, I have no idea at all. I would be very grateful for any help with that.
My data:
# A tibble: 11 x 4
from to x y
<chr> <chr> <chr> <chr>
1 z1 z3 40 30
2 z2 z4 20 30
3 z3 z5 15 30
4 z4 z8 60 30
5 z5 z9 10 50
6 z6 z4 30 60
7 z7 z4 50 50
8 z8 z4 65 50
9 z9 z5 20 80
10 z10 z8 40 90
11 z11 z7 60 70
My code:
p <- ggplot(data=fromto, aes(from_id = from, to_id = to, x=x,y=y))
p + geom_net(directed = TRUE,
layout.alg = NULL,
size = 6, arrowsize = 0.5,
curvature = 0.05,
arrowgap = 0.02,
linewidth = 0.5)
When I'm using smaller dataset it seems to work quite fine.
from to x y
1 z1 z3 10 25
2 z2 z1 20 30
3 z3 z2 40 30
p <- ggplot(data=fromto, aes(from_id = from, to_id = to, x=x,y=y))
p + geom_net(directed = TRUE,
layout.alg = NULL,
size = 6, arrowsize = 0.5,
curvature = 0.05,
arrowgap = 0.02,
linewidth = 0.5)
Outcome - simple plot
After an accurate analysis of the geom_net
code, I found that understanding how geomnet:::StatNet$compute_network
works is fundamental for solving your problem.
Typing
geomnet:::StatNet$compute_network(data=fromto, layout.alg =NULL)
gives the (wrong) output:
# A tibble: 21 x 7
# Groups: from, to [21]
from to x y xend yend weight
<fctr> <fctr> <int> <int> <int> <int> <int>
1 z1 z3 NA NA NA NA 1
2 z1 z5 40 30 10 50 1
3 z10 z6 40 90 30 60 1
4 z10 z8 NA NA NA NA 1
5 z11 z7 60 70 50 50 1
6 z2 z10 20 30 40 90 1
7 z2 z4 NA NA NA NA 1
8 z3 z5 NA NA NA NA 1
9 z3 z9 15 30 20 80 1
10 z4 z6 60 30 30 60 1
# ... with 11 more rows
Inside geomnet:::StatNet$compute_network
an important step is the construction of the edgelist
matrix:
net <- network::as.network(na.omit(fromto[, 1:2]), matrix.type = "edgelist")
summary(net)
( edgelist <- sna::as.edgelist.sna(net) )
The output is:
### summary(net)
Network adjacency matrix:
z1 z10 z11 z2 z3 z4 z5 z6 z7 z8 z9
z1 0 0 0 0 1 0 0 0 0 0 0
z10 0 0 0 0 0 0 0 0 0 1 0
z11 0 0 0 0 0 0 0 0 1 0 0
z2 0 0 0 0 0 1 0 0 0 0 0
z3 0 0 0 0 0 0 1 0 0 0 0
z4 0 0 0 0 0 0 0 0 0 1 0
z5 0 0 0 0 0 0 0 0 0 0 1
z6 0 0 0 0 0 1 0 0 0 0 0
z7 0 0 0 0 0 1 0 0 0 0 0
z8 0 0 0 0 0 1 0 0 0 0 0
z9 0 0 0 0 0 0 1 0 0 0 0
### edgelist
[,1] [,2] [,3]
[1,] 1 5 1
[2,] 4 6 1
[3,] 5 7 1
[4,] 6 10 1
[5,] 7 11 1
[6,] 8 6 1
[7,] 9 6 1
[8,] 10 6 1
[9,] 11 7 1
[10,] 2 10 1
[11,] 3 9 1
attr(,"n")
[1] 11
attr(,"vnames")
[1] "z1" "z10" "z11" "z2" "z3" "z4" "z5" "z6" "z7" "z8" "z9"
In the first row of the adjacency matrix we can see that z1
and z3
are (correctly) connected but
row-column labels have been ordered by character value, not taking account that
labels contain embedded numbers which (in our mind) should be numerically sorted.
The sna::as.edgelist.sna
function relies on row-column positions and not on their labels, hence in the first row of the output it gives [1,] 1 5 1
. This is obviously wrong.
These considerations suggest a possible solution: avoid vertix labels with embedded numbers and use (for example) only letters of the alphabet:
fromto$from_id <- LETTERS[as.numeric(gsub("z","",as.character(fromto$from_id)))]
fromto$to_id <- LETTERS[as.numeric(gsub("z","",as.character(fromto$to_id)))]
fromto
# from_id to_id x y
# 1 A C 40 30
# 2 B D 20 30
# 3 C E 15 30
# 4 D H 60 30
# 5 E I 10 50
# 6 F D 30 60
# 7 G D 50 50
# 8 H D 65 50
# 9 I E 20 80
# 10 J H 40 90
# 11 K G 60 70
library(geomnet)
ggplot(data=fromto, aes(from_id=from_id, to_id=to_id)) +
geom_net(aes(x=x, y=y),layout.alg = NULL)+
geom_text(aes(x=x, y=y, label=from_id), hjust=-1)