R running very slowly after loading large datasets

2019-08-20 01:31发布

问题:

I have been unable to work in R given how slow it is operating once my datasets are loaded. These datasets total around 8GB. I am running on a 8GB RAM and have adjusted memory.limit to exceed my RAM but nothing seems to be working. Also, I have used fread from the data.table package to read these files; simply because read.table would not run.

After seeing a similar post on the forum addressing the same issue, I have attempted to run gctorture(), but to no avail.

R is running so slowly that I cannot even check the length of the list of datasets I have uploaded, cannot View or do any basic operation once these datasets are uploaded.

I have tried uploading the datasets in 'pieces', so 1/3 of the total files over 3 times, which seemed to make things run more smoothly for the importing part, but has not changed anything with regards to how slow R runs after this.

Is there any way to get around this issue? Any help would be much appreciated.

Thank you all for your time.

回答1:

The problem arises because R loads the full dataset into the RAM which mostly brings the system to a halt when you try to View your data.

If it's a really huge dataset, first make sure the data contains only the most important columns and rows. Valid columns can be identified through the domain and world knowledge you have about the problem. You can also try to eliminate rows with missing values.

Once this is done, depending on your size of the data, you can try different approaches. One is through the use of packages like bigmemory and ff. bigmemory for example, creates a pointer object using which you can read the data from disk without loading it to the memory.

Another approach is through parallelism (implicit or explicit). MapReduce is another package which is very useful for handling big datasets.

For more information on these, check out this blog post on rpubs and this old but gold post from SO.



标签: r memory import