I have an excel file with several sheets, each one with several columns, so I would like to not to specify the type of column separately, but automatedly. I want to read them as stringsAsFactors= FALSE
would do, because it interprets the type of column, correctly. In my current method, a column width "0.492 ± 0.6" is interpreted as number, returning NA, "because" the stringsAsFactors
option is not available in read_excel
. So here, I write a workaround, that works more or less well, but that I cannot use in real life, because I am not allowed to create a new file. Note: I need other columns as numbers or integers, also others that have only text as characters, as stringsAsFactors
does in my read.csv
example.
library(readxl)
file= "myfile.xlsx"
firstread<-read_excel(file, sheet = "mysheet", col_names = TRUE, na = "", skip = 0)
#firstread has the problem of the a column with "0.492 ± 0.6",
#being interpreted as number (returns NA)
colna<-colnames(firstread)
# read every column as character
colnumt<-ncol(firstread)
textcol<-rep("text", colnumt)
secondreadchar<-read_excel(file, sheet = "mysheet", col_names = TRUE,
col_types = textcol, na = "", skip = 0)
# another column, with the number 0.532, is now 0.5319999999999999
# and several other similar cases.
# read again with stringsAsFactors
# critical step, in real life, I "cannot" write a csv file.
write.csv(secondreadchar, "allcharac.txt", row.names = FALSE)
stringsasfactor<-read.csv("allcharac.txt", stringsAsFactors = FALSE)
colnames(stringsasfactor)<-colna
# column with "0.492 ± 0.6" now is character, as desired, others numeric as desired as well
Here is a script that imports all the data in your excel file. It puts each sheet's data in a
list
calleddfs
:The process goes as follows:
First, you get all the sheets in the file with
excel_sheets
and then loop through the sheet names to create dataframes. For each of these dataframes, you initially import the data astext
by setting thecol_types
parameter totext
. Once you have gotten the dataframe's columns as text, you can convert the structure from atibble
to adata.frame
. After that, you then find columns that are actually numeric columns and convert them into numeric values.Edit:
As of late April, a new version of
readxl
got released, and theread_excel
function got two enhancements pertinent to this question. The first is that you can have the function guess the column types for you with the argument "guess" provided to thecol_types
parameter. The second enhancement (corollary to the first) is thatguess_max
parameter got added to theread_excel
function. This new parameter allows you to set the number of rows used for guessing the column types. Essentially, what I wrote above could be shortened with the following:I would recommend that you update
readxl
to the latest version to shorten your script and as a result avoid possible annoyances.I hope this helps.