My question is related to ggplot2 in R (3.2.3). I have 4 datasets containing 2 columns and about 80,000-100,000 rows. I used the below code for creating a plot for one the datasets:
dataset1 <- read.table("file1.txt", header=T)
ggplot(data=dataset1, aes(dataset1$length))+ geom_histogram (binwidth =500)+
scale_x_continuous(breaks=seq(300,1000,by=200),seq(1001,15000,by=1000))
Here, length is the 2th column of my dataset, I would like to have the plot, with x-axis shows the length between 300-1200 with interval of 200 (300,500,700,900,1200) and length between 1201-1500 with interval of 1000. So, I used the above code for scale_x_continuous, but it didn't produce what I want. Could you please help me with the correct code for this plotting?
Here is a short sample of data:
case length
C1099757 300
C1099759 300
C1099761 300
C1099763 300
C1100993 301
C1100995 301
C1100997 301
C1100999 301
C1101377 302
C1101379 302
C1101919 303
C1101921 303
C1102979 304
C1102981 304
C1102983 304
C1103475 305
C1103477 305
C1104267 306
C1104269 306
In the original data file, it continues to 12000 in the similar way that I posted. In your opinion, is ggplot2 suitable for this plotting, if not please suggest the right one.
Also, I'm looking for a way to show the length distribution of all 4 datasets in one graph in order to easily compare them. I would be highly appreciated if you could please let me know how I can do that?
Many thanks in advance.
I think you should replace
scale_x_continuous(breaks=seq(300,1000,by=200),seq(1001,15000,by=1000))
by
scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1001,15000,by=1000)))
or
scale_x_continuous(breaks=seq(300,1000,by=200)+
scale_y_continuous(breaks=seq(1001,15000,by=1000))
(wasn't entirely sure what you meant)
Based on your sample data i generated some artificial data which only contains length
df1 = data.frame(length=runif(300,300,1200))
df2 = data.frame(length=runif(300,300,1200))
df3 = data.frame(length=runif(300,900,1200))
df2 = data.frame(length=runif(300,300,12000))
df4 = data.frame(length=runif(300,300,12000))
# plotting a single dataset
ggplot(data=df4, aes(length))+ geom_histogram (binwidth =500)+
+ scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1001,15000,by=1000)))
#combine the datframes
df = data.frame(df1$length,df2$length,df3$length,df4$length)
library(reshape)
melted <- melt(df)
ggplot(data=melted, aes(value))+aes(fill=variable)+ geom_histogram (binwidth =500)+
scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1001,15000,by=1000)))
or
ggplot(data=melted, aes(value))+aes(fill=variable)+ geom_histogram (binwidth =500,position="dodge")+
scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1001,15000,by=1000)))
To have a slightly nicer X-Axis labelling I redistributed the labels a bit and turned them by 45 degrees
ggplot(data=melted, aes(value))+aes(fill=variable)+ geom_histogram (binwidth =500,position="dodge")+
scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1100,15000,by=1000)))+theme(axis.text.x = element_text(angle = 45, hjust = 1))
And the chart witch adjusts the Bins accordingly. I actually like the different bar sizes.
ggplot(data=melted, aes(value))+
aes(fill=variable)+
geom_histogram(breaks=c(seq(300,1000,by=200),seq(1100,15000,by=1000)),position="dodge")+
scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1100,15000,by=1000)))+
theme(axis.text.x = element_text(angle = 45, hjust = 1))