In my case, there are 100 unique (X
, Y
) points with each having an ID and belongs a Type
. In these 100 points, 20 points have values for three other Types (CT
,D
,OP
).
Here is the data generation process:
df <- data.frame(X=rnorm(100,0,1), Y=rnorm(100,0,1),
ID=paste(rep("ID", 100), 1:100, sep="_"),
Type=rep("ID",100),
Val=c(rep(c('Type1','Type2'),30),
rep(c('Type3','Type4'),20)))
Randomly selected 20 points (sample(1:100,20)
) will have values which add extra information to the points. All these 20 points in this extra Type
will have information in Type=="ID"
.
dat1 <- data.frame(Type=rep('CT',20),
Val=paste(rep("CT", 20),
sample(1:6,20,replace=T), sep="_"))
dat1 <- cbind(df[sample(1:100,20),1:3],dat1)
dat2 <- data.frame(Type=rep('D',20),
Val=paste(rep("D", 20),
sample(1:6,20,replace=T), sep="_"))
dat2 <- cbind(df[sample(1:100,20),1:3],dat2)
dat3 <- data.frame(Type=rep('OP',20),
Val=paste(rep("OP", 20),
sample(1:6,20,replace=T), sep="_"))
dat3 <- cbind(df[sample(1:100,20),1:3],dat3)
df <- rbind(df, dat1, dat2, dat3)
Now, plotting the points having D_1
,D_4
values for Type=="D"
.
df %>% filter(Val %in% c('D_1','D_4')) %>%
ggplot(aes(X,Y,col=Val)) + geom_point() + geom_text(aes(label=ID))
Note: I have added IDs geom_text(aes(label=ID))
only for illustartion purposes.
To this, existing plot, I have to add remaining 92 points which do not have above two values or no values at all. I have tried adding additional points to an existing approach mentioned by Hadley here:
p <- df %>% filter(Val %in% c('D_1','D_4')) %>% ggplot(aes(X,Y,col=Val)) + geom_point()
p + geom_point(data=df[(!df$ID %in% df$ID[df$Val %in% c('D_1','D_4')]) & df$Type=="ID",],
colour="grey")
Questions:
How to plot selected points and additional points in a single command or in an elegant way possible?
Is there any possible
dplyr
approach which can be used in above command?
update: df$Type=="ID"
is very important as it allows plotting of the remaining points only once. Otherwise, some of these points having values in either CT
or D
or OP
leads to duplicated plotting.
df %>% count(X,Y) %>% arrange(desc(n))
# # A tibble: 100 x 3
# X Y n
# <dbl> <dbl> <int>
# 1 -0.86266147 2.0368626 4
# 2 -0.61770678 0.4428537 4
# 3 1.32441957 -0.9388095 4
# 4 -1.65650319 -0.1551399 3
# 5 -0.99946809 1.1791395 3
# 6 -0.52881072 0.1742483 3
# 7 -0.25892382 0.1380577 3
# 8 -0.19239410 0.5269329 3
# 9 -0.09709764 -0.4855484 3
# 10 -0.05977874 0.1771422 3
# # ... with 90 more rows
Looks like, first three points with the same X, Y values have values for Type
ID, CT, D, OP. But these points need to be plotted only once.