I have some strange results with huge collections sets when trying to transfer as data frames from MongoDB to R with rmongodb and plyr packages. I pick up this code from various github and forums on the subject, and adapt it for my purposes :
## load the both packages
library(rmongodb)
library(plyr)
## connect to MongoDB
mongo <- mongo.create(host="localhost")
# [1] TRUE
## get the list of the databases
mongo.get.databases(mongo)
# list of databases (with mydatabase)
## get the list of the collections of mydatabase
mongo.get.collections(mongo, db = "mydatabase")
# list of all the collections of my database
## Verify the size of mycollection
DBNS = "mycollection"
mongo.count(mongo, ns = DBNS)
# [1] 845923 documents inside "my collection"
## transform mycollection (in BSON MongoDB format) to a data frame (adapted for R)
export = data.frame(stringAsFactors = FALSE)
cursor = mongo.find(mongo, DBNS)
i = 1
while(mongo.cursor.next(cursor))
{
tmp = mongo.bson.to.list(mongo.cursor.value(cursor))
tmp.df = as.data.frame(t(unlist(tmp)), stringAsFactors = FALSE)
export = rbind.fill(export, tmp.df)
i = i + 1
}
## show the size of the database "export"
dim(export)
# [1] 20585 23
## check more information on the database "export"
str(export)
# 'data.frame': 20585 obs. of 23 variables
# etc…
The transfer is not well done : there is a huge difference between the 845923 documents inside "mycollection" found in MongoDB and the 20585 observations in R.
I may not agree with the code above. I'm not sure that the i = 1 and the i = i + 1 are useful for this function (may be coming from code with queries with rmongodb), if I have no specific values to attached with. I found also the "t(unlist(tmp))" strange, where the t comes from ?
The problem is that I encounter some big differences from collections size in MongoDB and database size in R with large collections sets (superior to several thousands of documents). My PC have a good RAM and R seems to work well during the process (no freeze, no crash, taking time but normal due to the large conversion to do from BSON to list to data frame).
I have succeed to transfer a MongoDB collection of 36100 documents from MongoDB to R for data analysis with no problem.
So I'm not sure where the problem is coming from.
Thanks in advance for any help on this subject.