I am reading a data of dimension 3131875*5 in r using read.big.matrix
. My data has both character and numeric columns including date variable. The command which I should use is
as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",
header=TRUE,
backingfile="session.bin",
descriptorfile="session.desc",
type = NA)
But type = NA
is not accepted in R in this case and I am getting an error:
Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type, :
Problem creating filebacked matrix.
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt", :
Because type was not specified, we chose double based on the first line of data.
I need to know what should be the type
here. I tried with options like double
but that is throwing me same error.
Please help me.
From
?read.big.matrix
:Therefore, you won't be able to read in data with combinations of character, numeric, integer, date, etc. You could do some work on the file, for instance using a different program to convert the character variables to integer representations (like converting to a factor in R).
EDIT:
On the bigmemory website there's an example of preprocessing data using a python script to change character information to integer. The script is written for a specific dataset, but perhaps you could use it as a guideline for your data.