Replacing points of color by a uniform colored sur

2019-07-24 16:51发布

问题:

Here is my data and my current plot

require(ggplot2)
a = rep(c(2,5,10,15,20,30,40,50,75,100), each=7)
b = rep(c(0.001,0.005,0.01,0.05,0.5,5,50), 10)
c = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE)
dt = data.frame(a=a,b=b,c=c)
ggplot(dt, aes(x=a, y=b, color=c)) + geom_point() + scale_y_log10()

Instead of the above blue and orange points, I would like the background to be colored in blue and orange. The boundary can either be straight lines or some LOESS line or whatever is easier to implement (some smooth line would be more fancy I think)! It sounds like a difficult problem to me. I welcome variants of the solution I asked as long as it looks good!

Can you help me with that? Thank you.

回答1:

You could try this, the idea is to find the points for each group that would be at the separation of the two regions, then take the middle of these two points and get a LOESS line as boundary:

library(dplyr)
#make column c numeric and order the dataframe 
dt$c<-dt$c*1
dt<-dt[order(a,c),]

#get all the points that are where the change of "region" happens 
#here it is where the c variable switches from 0 to 1, since dt is ordered
#by a and c, you can just find the first 1 and take that point and the one 
#before

get_group_change<-function(x){
  idx<-min(which(x[,"c"]==1))
  x[c(idx-1,idx),]
}

boundary_points<-dt %>% group_by(a) %>% do(get_group_change(.))

#get the point in the middle of the boundary points
get_middle<-function(x){exp(mean(log(x)))}

middle_points<-boundary_points %>% group_by(a) %>% summarise_each(funs(get_middle),a,b)
middle_points$c<-2

#make a boundary data frame with a LOESS prediction for b
boundary<-data.frame(a=2:100,b=exp(predict(loess(log(b)~a,middle_points),2:100)),c=2)


#plot the regions, the middle_points are also plotted 
ggplot(rbind(dt,middle_points), aes(x=a, y=b, color=as.factor(c))) + geom_point() + scale_y_log10()+
  geom_ribbon(data=boundary,aes(ymin=min(dt$b),ymax=b),alpha=0.1,fill="red",colour=NA)+
  geom_ribbon(data=boundary,aes(ymin=b,ymax=max(dt$b)),alpha=0.1,fill="green",colour=NA)

I get something like this:

Or with straight lines for the boundary:

ggplot(rbind(dt,middle_points), aes(x=a, y=b, color=as.factor(c))) + geom_point() + scale_y_log10()+
  geom_ribbon(data=middle_points,aes(ymin=min(dt$b),ymax=b),alpha=0.1,fill="red",colour=NA)+
  geom_ribbon(data=middle_points,aes(ymin=b,ymax=max(dt$b)),alpha=0.1,fill="green",colour=NA)

Wouldn't be possible if the points did not have a discrete b...