I need to build a barplot of my data, showing bacterial relative abundance in different samples (each column should sum to 1 in the complete dataset).
A subset of my data:
> mydata
Taxon CD6 CD1 CD12
Actinomycetaceae;g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae;g__Atopobium 0.018691589 0.003244536 0.00447774
Corynebacteriaceae;g__Corynebacterium 0.001846083 0.006403689 0.000516662
Micrococcaceae;g__Rothia 0.001730703 0.000426913 0.001894429
Porphyromonadaceae;g__Porphyromonas 0.073497173 0.065915301 0.175406872
What I'd like to have is a bar for each sample (CD6, CD1, CD12), where the y values are the relative abundance of bacterial species (the Taxon column).
I think (but I'm not sure) my data format is not right to do the plot, since I don't have a variable to group by like in the examples I found...
ggplot(data) + geom_bar(aes(x=revision, y=added), stat="identity", fill="white", colour="black")
Is there a way to order my data making them right as input to this code?
Or how can I modify it?
Thanks!
Do you want something like this?
# sample data
df <- read.table(header=T, sep=" ", text="
Taxon CD6 CD1 CD12
Actinomycetaceae;g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae;g__Atopobium 0.018691589 0.003244536 0.00447774
Corynebacteriaceae;g__Corynebacterium 0.001846083 0.006403689 0.000516662
Micrococcaceae;g__Rothia 0.001730703 0.000426913 0.001894429
Porphyromonadaceae;g__Porphyromonas 0.073497173 0.065915301 0.175406872")
# convert wide data format to long format
require(reshape2)
df.long <- melt(df, id.vars="Taxon",
measure.vars=grep("CD\\d+", names(df), val=T),
variable.name="sample",
value.name="value")
# calculate proportions
require(plyr)
df.long <- ddply(df.long, .(sample), transform, value=value/sum(value))
# order samples by id
df.long$sample <- reorder(df.long$sample, as.numeric(sub("CD", "", df.long$sample)))
# plot using ggplot
require(ggplot2)
ggplot(df.long, aes(x=sample, y=value, fill=Taxon)) +
geom_bar(stat="identity") +
scale_fill_manual(values=scales::hue_pal(h = c(0, 360) + 15, # add manual colors
c = 100,
l = 65,
h.start = 0,
direction = 1)(length(levels(df$Taxon))))