I am currently working on clustering some big data, about 30k rows, the dissimilarity matrix just too big for R to handle, I think this is not purely memory size problem. Maybe there are some smart way to do this?
相关问题
- R - Quantstart: Testing Strategy on Multiple Equit
- Using predict with svyglm
- Reshape matrix by rows
- Extract P-Values from Dunnett Test into a Table by
- split data frame into two by column value [duplica
相关文章
- How to convert summary output to a data frame?
- How to plot smoother curves in R
- Paste all possible diagonals of an n*n matrix or d
- ess-rdired: I get this error “no ESS process is as
- How to use doMC under Windows or alternative paral
- dyLimit for limited time in Dygraphs
- Saving state of Shiny app to be restored later
- How to insert pictures into each individual bar in
If your data is so large that base R can't easily cope, then you have several options:
Here is an example using
RevoScaleR
the commercial package by Revolution. I use the datasetdiamonds
, part ofggplot2
since this contains 53K rows, i.e. a bit larger than your data. The example doesn't make much analytic sense, since I naively convert factors into numerics, but it illustrates the computation on a laptop:This results in: