I do not have experience with loops but it looks like I will need to create some of them to analyze my data properly. Could you show how to create a simple loop on the code which I already created ? Let's use looping to get some plots:
pdf(file = sprintf("complex I analysis", tbl_comp_abu1), paper='A4r')
ggplot(df_tbl_data1_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) +
theme(legend.title=element_blank()) +
geom_line(aes(color=factor(Gene_Name))) +
ggtitle("Data1 - complex I")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(df_tbl_data2_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) +
theme(legend.title=element_blank()) +
geom_line(aes(color=factor(Gene_Name))) +
ggtitle("Data2 - complex I")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(df_tbl_data3_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) +
theme(legend.title=element_blank()) +
geom_line(aes(color=factor(Gene_Name))) +
ggtitle("Datas3 - complex I")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
dev.off()
The question now is what I would like to achieve. So first of all I have like 10 complexes to analyze so that means 10 pdf files should be created and the example shows plots from three different data sets for the complex one. To make it properly the number in variable comp1
(from df_tbl_dataX_comp1
) has to be changed from 1 to 10 - depends which complex we want to plot. The next thing which has to be changed through the loop is the name of pdf file and each of graphs... Is it possible to write such loop ?
Data:
structure(list(Size_Range = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L,
8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L,
13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L,
17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L), .Label = c("10",
"34", "59", "84", "110", "134", "165", "199", "234", "257", "362",
"433", "506", "581", "652", "733", "818", "896", "972", "1039"
), class = "factor"), Abundance = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 142733.475, 108263.525, 98261.11, 649286.165,
3320759.803, 3708515.148, 6691260.945, 30946562.92, 180974.3725,
4530005.805, 21499827.89, 0, 15032198.54, 4058060.583, 0, 3842964.97,
2544030.857, 0, 1640476.977, 286249.1775, 0, 217388.5675, 1252965.433,
0, 1314666.05, 167467.8825, 0, 253798.15, 107244.9925, 0, 207341.1925,
15755.485, 0, 71015.85, 14828.5075, 0, 25966.2325, 0, 0, 0, 0,
0, 0), Gene_Name = c("AT1G01080", "AT1G01090", "AT1G01320", "AT1G01420",
"AT1G01470", "AT1G01560", "AT1G01800", "AT1G02150", "AT1G02500",
"AT1G02560", "AT1G02780", "AT1G02880", "AT1G02920", "AT1G02930",
"AT1G03030", "AT1G03090", "AT1G03110", "AT1G03130", "AT1G03220",
"AT1G03230", "AT1G03330", "AT1G03475", "AT1G03630", "AT1G03680",
"AT1G03870", "ATCG00420", "ATCG00470", "ATCG00480", "ATCG00490",
"ATCG00500", "ATCG00650", "ATCG00660", "ATCG00670", "ATCG00740",
"ATCG00750", "ATCG00842", "ATCG01100", "ATCG01030", "ATCG01114",
"ATCG01665", "ATCG00770", "ATCG00780", "ATCG00800", "ATCG00810",
"ATCG00820", "ATCG00722", "ATCG00744", "ATCG00855", "ATCG00853",
"ATCG00888", "ATCG00733", "ATCG00766", "ATCG00812", "ATCG00821",
"ATCG00856", "ATCG00830", "ATCG00900", "ATCG01060", "ATCG01110",
"ATCG01120")), .Names = c("Size_Range", "Abundance", "Gene_Name"
), row.names = c(NA, -60L), class = "data.frame")
So after making my answer, I realized it doesn't address the actual question about loops. However, I hope it shows you a different way of approaching your root problem (a.k.a I didn't want the work to go to waste).
I couldn't get your plot to work with the data you posted. There are 60 unique gene names in a 60-row data frame. When you try to make a
geom_line
and group by gene (aes(group=Gene_name)
), you only have one point for each line. You need two points to make a line.I made up some data and did an analysis.
So you're studying protein complexes in Arabidopsis? In case someone is familiar with your domain, a sentence of background might help them answering your question. Alternatively, a picture of the desired output could help. Also, some more complete example data and/or screenshots might generate more interest in your future posts.
Have a look at this approach. It depends on a
data.frame
(dat
) that contains the names of your datasets, the graph titles, as well as the file names.First I create a function that creates the plot and saves it, then I call the function in a
for
-loop and also in anapply
-loop (use apply where possible, its faster).The code looks like this:
This might do the trick: Initiate two loops, one for the complex iteration and a second for the dataset iteration. Then use
paste0()
orpaste()
to generate the correct filenames and headings.PS.: I didn't test the code, since I dont have your data. But it should give you an idea.