Reading big data in R by read.big.matrix

2019-05-24 08:07发布

I am reading a data of dimension 3131875*5 in r using read.big.matrix. My data has both character and numeric columns including date variable. The command which I should use is

as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",
                       header=TRUE, 
                       backingfile="session.bin",
                       descriptorfile="session.desc",
                       type = NA)

But type = NA is not accepted in R in this case and I am getting an error:

Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type,  : 
  Problem creating filebacked matrix.
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",  :
  Because type was not specified, we chose double based on the first line of data.

I need to know what should be the type here. I tried with options like double but that is throwing me same error.

Please help me.

标签： r r-bigmemory

1条回答

看我几分像从前

2楼-- · 2019-05-24 09:12

From ?read.big.matrix:

Files must contain only one atomic type (all integer, for example).

Therefore, you won't be able to read in data with combinations of character, numeric, integer, date, etc. You could do some work on the file, for instance using a different program to convert the character variables to integer representations (like converting to a factor in R).

EDIT:

On the bigmemory website there's an example of preprocessing data using a python script to change character information to integer. The script is written for a specific dataset, but perhaps you could use it as a guideline for your data.

0人赞添加讨论(0) 举报

Reading big data in R by read.big.matrix

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间