How to part diverging bar plots in R

2019-06-13 07:00发布

Hi I am relatively new in R / ggplot2 and I would like to ask for some advice on how to create a plot that looks like this:

enter image description here

Explanation: A diverging bar plot showing biological functions with genes that have increased expression (yellow) pointing towards the right, as well as genes with reduced expression (purple) pointing towards the left. The length of the bars represent the number of differentially expressed genes, and color intensity vary according to their p-values.

Note that the x-axis must be 'positive' in both directions. (In published literature on gene expression experimental studies, bars that point towards the left represent genes that have reduced expression, and right to show genes that have increased expression. The purpose of the graph is not to show the "magnitude" of change (which would give rise to positive and negative values). Instead, we are trying to plot the NUMBER of genes that have changes of expression, therefore cannot be negative)

I have tried ggplot2 but fails completely to reproduce the graph that is shown. Here is the data which I am trying to plot: Click here for link

> dput(sample)
structure(list(Name = structure(c(15L, 19L, 5L, 11L, 8L, 6L, 
16L, 13L, 17L, 1L, 3L, 2L, 14L, 18L, 7L, 12L, 10L, 9L, 4L, 20L
), .Label = c("Actin synthesis", "Adaptive immunity", "Antigen presentation", 
"Autophagy", "Cell cycle", "Cell division", "Cell polarity", 
"DNA repair", "Eye development", "Lipid metabolism", "Phosphorylation", 
"Protein metabolism", "Protein translation", "Proteolysis", "Replication", 
"Signaling", "Sumoylation", "Trafficking", "Transcription", "Translational initiation"
), class = "factor"), Trend_in_AE = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("Down", "Up"), class = "factor"), Count = c(171L, 
201L, 38L, 63L, 63L, 47L, 22L, 33L, 20L, 16L, 16L, 7L, 10L, 4L, 
13L, 15L, 5L, 7L, 9L, 7L), PValue = c(1.38e-08, 1.22e-06, 1.79e-06, 
2.89e-06, 0.000122, 0.000123, 0.00036, 0.000682, 0.001030253, 
0.001623939, 7.76e-05, 0.000149, 0.000734, 0.001307039, 0.00292414, 
0.003347556, 0.00360096, 0.004006781, 0.007330264, 0.010083734
)), .Names = c("Name", "Trend_in_AE", "Count", "PValue"), class = "data.frame", row.names = c(NA, 
-20L))

Thank you very much for your help and suggestions, this is really help with my learning.

My own humble attempt was this:

table <- read.delim("file.txt", header = T, sep = "\t")
library(ggplot2)
ggplot(aes(x=Number, y=Names)) + 
  geom_bar(stat="identity",position="identity") + 
  xlab("number of genes") + 
  ylab("Name"))

Result was error message regarding the aes

标签: r ggplot2
2条回答
我欲成王,谁敢阻挡
2楼-- · 2019-06-13 07:36

@ddw and @Ashish are right - there's a lot in this question. It's also not clear how ggplot "failed" in reproducing the figure, and that would help understand what you're struggling with.

The key to ggplot is that pretty much everything that you want to include in the plotting should be included in the data. Adding a few variables to your table to help with putting bars in the right direction will get you a long way toward what you want. Make the variables that are actually negative ("down" values) negative, and they'll plot that way:

r_sample$Count2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$Count*-1,r_sample$Count)
r_sample$PValue2 <- ifelse(r_sample$Trend_in_AE=="Down",r_sample$PValue*-1,r_sample$PValue)

Then reorder your "Name" so that it plots according to the new PValue2 variable:

r_sample$Name <- factor(r_sample$Name, r_sample$Name[order(r_sample$PValue2)], ordered=T)

Lastly, you'll want to left-justify some labels and right-justify others, so make that a variable now:

r_sample$just <- ifelse(r_sample$Trend_in_AE=="Down",0,1)

Then some fairly minimal plot code gets you quite close to what you want:

ggplot(r_sample, aes(x=Name, y=Count2, fill=PValue2)) +
  geom_bar(stat="identity") +
  scale_y_continuous("Number of Differently Regulated Genes", position="top", limits=c(-100,225), labels=c(100,0,100,200)) +
  scale_x_discrete("", labels=NULL) +
  scale_fill_gradient2(low="blue", mid="light grey", high="yellow", midpoint=0) +
  coord_flip() +
  theme_minimal() +
  geom_text(aes(x=Name, y=0, label=Name), hjust=r_sample$just)

You can explore the theme commands on the ggplot2 help page to figure out the rest of the formatting.

enter image description here

查看更多
女痞
3楼-- · 2019-06-13 07:44

Although not exactly what you are looking for, but the following should get you started. @Genoa, as the expression goes, "there are no free lunches". So in this spirit, like @dww has rightly pointed out, show "some effort"!

# create dummy data
df <- data.frame(x = letters,y = runif(26))
# compute normalized occurence for letter
df$normalize_occurence <- round((df$y - mean(df$y))/sd(df$y), 2)  
# categorise the occurence
df$category<- ifelse(df$normalize_occurence >0, "high","low")
# check summary statistic
summary(df)
       x            y           normalize_occurence 
a      : 1   Min.   :0.00394   Min.   :-1.8000000  
b      : 1   1st Qu.:0.31010   1st Qu.:-0.6900000  
c      : 1   Median :0.47881   Median :-0.0800000  
d      : 1   Mean   :0.50126   Mean   : 0.0007692  
e      : 1   3rd Qu.:0.70286   3rd Qu.: 0.7325000  
f      : 1   Max.   :0.93091   Max.   : 1.5600000  
(Other):20                                         
category        
Length:26         
Class :character  
Mode  :character 

ggplot(df,aes(x = x,y = normalize_occurence)) + 
      geom_bar(aes(fill = category),stat = "identity") +
      labs(title= "Diverging Bars")+
      coord_flip()

Diverging bars

查看更多
登录 后发表回答