working with ggplot2

2019-08-07 10:42发布

My question is related to ggplot2 in R (3.2.3). I have 4 datasets containing 2 columns and about 80,000-100,000 rows. I used the below code for creating a plot for one the datasets:

dataset1 <- read.table("file1.txt", header=T)
ggplot(data=dataset1, aes(dataset1$length))+ geom_histogram (binwidth =500)+
scale_x_continuous(breaks=seq(300,1000,by=200),seq(1001,15000,by=1000))

Here, length is the 2th column of my dataset, I would like to have the plot, with x-axis shows the length between 300-1200 with interval of 200 (300,500,700,900,1200) and length between 1201-1500 with interval of 1000. So, I used the above code for scale_x_continuous, but it didn't produce what I want. Could you please help me with the correct code for this plotting?

Here is a short sample of data:

case length
C1099757    300
C1099759    300
C1099761    300
C1099763    300
C1100993    301
C1100995    301
C1100997    301
C1100999    301
C1101377    302
C1101379    302
C1101919    303
C1101921    303
C1102979    304
C1102981    304
C1102983    304
C1103475    305
C1103477    305
C1104267    306
C1104269    306

In the original data file, it continues to 12000 in the similar way that I posted. In your opinion, is ggplot2 suitable for this plotting, if not please suggest the right one.

Also, I'm looking for a way to show the length distribution of all 4 datasets in one graph in order to easily compare them. I would be highly appreciated if you could please let me know how I can do that?

Many thanks in advance.

标签: r plot ggplot2
1条回答
淡お忘
2楼-- · 2019-08-07 11:00

I think you should replace

scale_x_continuous(breaks=seq(300,1000,by=200),seq(1001,15000,by=1000))

by

scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1001,15000,by=1000)))

or

scale_x_continuous(breaks=seq(300,1000,by=200)+
scale_y_continuous(breaks=seq(1001,15000,by=1000))

(wasn't entirely sure what you meant)

Based on your sample data i generated some artificial data which only contains length

 df1 = data.frame(length=runif(300,300,1200))
 df2 = data.frame(length=runif(300,300,1200))
 df3 = data.frame(length=runif(300,900,1200))
 df2 = data.frame(length=runif(300,300,12000))
 df4 = data.frame(length=runif(300,300,12000))

# plotting a single dataset
 ggplot(data=df4, aes(length))+ geom_histogram (binwidth =500)+
 +     scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1001,15000,by=1000)))
#combine the datframes
df = data.frame(df1$length,df2$length,df3$length,df4$length)
library(reshape)
melted <- melt(df)
ggplot(data=melted, aes(value))+aes(fill=variable)+ geom_histogram (binwidth =500)+
     scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1001,15000,by=1000)))

enter image description here

or

 ggplot(data=melted, aes(value))+aes(fill=variable)+ geom_histogram (binwidth =500,position="dodge")+
    scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1001,15000,by=1000)))

enter image description here

To have a slightly nicer X-Axis labelling I redistributed the labels a bit and turned them by 45 degrees

ggplot(data=melted, aes(value))+aes(fill=variable)+ geom_histogram (binwidth =500,position="dodge")+
     scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1100,15000,by=1000)))+theme(axis.text.x = element_text(angle = 45, hjust = 1))

enter image description here

And the chart witch adjusts the Bins accordingly. I actually like the different bar sizes.

ggplot(data=melted, aes(value))+
        aes(fill=variable)+ 
        geom_histogram(breaks=c(seq(300,1000,by=200),seq(1100,15000,by=1000)),position="dodge")+

        scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1100,15000,by=1000)))+
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

enter image description here

查看更多
登录 后发表回答