I have a scripte here and it works fine to get a number in a certain column. Now i want to collect not only on the 1st sheet of every files in the directory, but each sheets of each files.
now the .csv file R has written shows 2 column, column A is filename, and B is the number R grabbed.
What modification should I add to the below script to have a csv output that shows 3 column, A is filename, B is sheetnames, C is the number?
require(xlsx)
#setwd
setwd("D:\\Transferred Files\\")
files <- (Sys.glob("*.xls"))
f<-length(files)
DF <- data.frame(txt=rep("", f),num=rep(NA, f),stringsAsFactors=FALSE)
# files loop
for(i in 1:f)
{
A<-read.xlsx(file=files[i],1,startColumn=1, endColumn=20, startRow=1, endRow=60)
#Find price
B<-as.data.frame.matrix(A)
P<-B[which(apply(B, 1, function(x) any(grepl("P", x)))),which(apply(B, 2, function(x) any(grepl("P",
x))))+6]
#fill price DF
DF[i, ] <-c(files[i],P)
}
write.csv(DF, "prices.csv", row.names=FALSE)
I tried XLconnet, but can't really make it work into this.
You have a good start, but you are asking how to add into the loop the worksheets within a file. If you read
?read.xlsx
, you'll see two arguments that you are glossing over (well, using one, ignoring the other) in your code:You should only need to provide one of the two.
You might ask "how do I know how many sheets there are in a worksheet?" (for
sheetIndex
) or even "what are the sheet names?" (forsheetName
).?getSheets
to the rescue:You'll need to use
loadWorkbook(file)
instead ofread.xlsx
in order to get the sheet names, but a little reading of the manuals will provide you the information you need to switch over. (You can use something likegetSheets(loadWorkbook(file))
, but in my experience I try to avoid opening the same file multiple times in the same script, regardless of auto-closing.)As an alternative, Hadley's
readxl
package shows promise in its simplicity, speed, and stability. It hasexcel_sheets()
andread_excel()
that should fill your needs. (In fact, that's all it has ... simplicity is "A Good Thing (tm)".)Edit:
Edit #2:
As a more detailed iterative example:
I'm not really sure what you are doing with your code (since you never said what was contained within any of the spreadsheets), but an alternative approach (that I would use in place of the previous double-
for
example) is to enclose everything within lists:If you don't care about differentiating from which workbook a particular sheet came from -- and subsequently simplify processing the data -- then you can "flatten" the nested lists into a single list:
Now, processing your data is perhaps simpler. Your code for looking for "P"s is a bit flawed in that you are assigning a data.frame to a cell within another data.frame, typically frowned upon.
This will likely turn in to another question for you. For that, I strongly urge you to provide a better detailed question, including what a sample worksheet looks like and what you expect your output to look like.