R: How to quickly read large .dta files without RA

2019-02-14 23:29发布

I have a 10 GB .dta Stata file and I am trying to read it into 64-bit R 3.3.1. I am working on a virtual machine with about 130 GB of RAM (4 TB HD) and the .dta file is about 3 million rows and somewhere between 400 and 800 variables.

I know data.table() is the fastest way to read in .txt and .csv files, but does anyone have a recommendation for reading largeish .dta files into R? Reading the file into Stata as a .dta file requires about 20-30 seconds, although I need to set my working memory max prior to opening the file (I set the max at 100 GB).

I have not tried importing to .csv in Stata, but I hope to avoid touching the file with Stata. A solution is found via Using memisc to import stata .dta file into R but this assumes RAM is scarce. In my case, I should have sufficient RAM to work with the file.

标签： r memory stata large-files

2条回答

傲

2楼-- · 2019-02-15 00:06

I recommend the haven R package. Unlike foreign, It can read the latest Stata formats:

library(haven)
data <- read_dta('myfile.dta')

Not sure how fast it is compared to other options, but your choices for reading Stata files in R are rather limited. My understanding is that haven wraps a C library, so it's probably your fastest option.

0人赞添加讨论(0) 举报

【Aperson】

3楼-- · 2019-02-15 00:09

The fastest way to load a large Stata dataset in R is using the readstata13 package. I have compared the performance of foreign, readstata13, and haven packages on a large dataset in this post and the results repeatedly showed that readstata13 is the fastest available package for reading Stata dataset in R.

0人赞添加讨论(0) 举报

R: How to quickly read large .dta files without RA

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间