The csv file to be processed does not fit into the memory. How can one read ~20K random lines of it to do basic statistics on the selected data frame?
相关问题
- R - Quantstart: Testing Strategy on Multiple Equit
- How to remove spaces in between characters without
- Using predict with svyglm
- Reshape matrix by rows
- Extract P-Values from Dunnett Test into a Table by
相关文章
- How to convert summary output to a data frame?
- How to plot smoother curves in R
- Paste all possible diagonals of an n*n matrix or d
- ess-rdired: I get this error “no ESS process is as
- How to read local csv file in client side javascri
- How to use doMC under Windows or alternative paral
- dyLimit for limited time in Dygraphs
- Saving state of Shiny app to be restored later
This should work:
Try this based on examples 6e and 6f on the sqldf github home page:
See
?read.csv.sql
using other arguments as needed based on the particulars of your file.The following can be used in case you have an ID or something similar in your data. Take a sample of IDs, then take the subset of the data using the sampled ids.
You can also just do it in the terminal with perl.
perl -ne 'print if (rand() < .01)' biglist.txt > subset.txt
This won't necessarily get you exactly 20,000 lines. (Here it'll grab about .01 or 1% of the total lines.) It will, however, be really really fast, and you'll have a nice copy of both files in your directory. You can then load the smaller file into R however you want.