I have a .csv file with data for different chromosomes. The chromosomes names are stored in the first column(column name: Chr). My aim is to separate the data for each chromosome i.e. (Chr1,Chr2 etc) and make separate csv files for each. I cannot understand how to do this in limited steps. Thanks
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Illustrating a one liner using plyr
and the dataset iris
plyr::d_ply(iris, .(Species), function(x) write.csv(x,
file = paste(x$Species, ".csv", sep = "")))
回答2:
Read Data
fn <- dir(pattern="csv") data_in <- do.call(rbind,lapply(fn,read.csv))
Split by chromosome
data_out <- split(data_in,data_in[[1]])
Write by chromosome
chn <- unlist(lapply(data_out,"[",1,1)) for(i in seq_along(chn)) write.csv(data_out[[i]],file=paste(chn[i],"csv",sep="."))
回答3:
One way is to read the input file one line at a time and append the line to the correct outfile based on the first x characters of the line:
con <- file('yourInputFile', 'r')
while (length(input <- readLines(con, n=1) > 0){
outputfile <- paste(substr(input, 1, 5), ".csv", sep="" )
### assuming first 5 characters are new file name
outfile <- file(outputfile, 'a')
writeLines(output, con=outfile)
close(outfile)
}
The advantage of this approach is that it works even if yourInputFile is too big to read into memory. The downside is that this approach is slow as it does a lot of opening/closing of files.