I am trying to query a mongo database from R using RMongo and return the values of a couple nested documents.
Looking through the documentation for RMongo, I understand the following query:
output <- dbGetQueryForKeys(mongo, 'test_data', '{"foo": "bar"}', '{"foo":1}')
Where the arguments are...
db = mongo
collection = 'test_data'
query = '{"foo": "bar"}'
keys = 'Specify a set of keys to return.'
What is the 1 in '{"foo":1}'
? What is the structure of this key set? Checking against this blog post, I found a format like:
result < - dbGetQueryForKeys(mongo, "items", "{'publish_date' : { '$gte' : '2011-04-01', '$lt' : '2011-05-01'}}", "{'publish_date' : 1, 'rank' : 1}")
So, apparently, the keys need the value 1?
How would I get keys for nested documents? If I wanted something like...
output <- dbGetQueryForKeys(mongo, 'test_data', '{"foo": "bar"}', '{"foo1.foo2.foo3.foo4":1,"foo1.foo2.foo3.bar4":1}')
For nested keys, I'm currently returning something more like...
X_id
1 50fabd42a29d6013864fb9d7
foo1
1 { "foo2" : { "foo3" : { "foo4" : "090909" , "bar4" : "1"}}}
...where output[,2]
is a looooong string, rather than as two separate variables for the values associated with the keys foo4 and bar4, ("090909", "1") as I would have expected.
What is the 1 in '{"foo":1}'? What is the structure of this key set?
These keys are the query projections to return for read operations in MongoDB. A value of "1" means to include a specific field and "0" excludes. The default behaviour is to include all fields in the projection.
How would I get keys for nested documents?
For nested keys, I'm currently returning something more like...
1 { "foo2" : { "foo3" : { "foo4" : "090909" , "bar4" : "1"}}}
...where output[,2] is a looooong string, rather than as two
separate variables for the values associated with the keys foo4
and bar4, ("090909", "1") as I would have expected.
The RMongo driver is returning data including the embedding hiearchy.
You can reshape & flatten the result output using the RMongo dbAggregate()
command and the $project
operator which is part of the Aggregation Framework in MongoDB 2.2+.
If your end goal is to extract the values from the nested object for some type of downstream processing in R this will get you there. It avoids having to build an aggregation pipeline and is a simple solution to your problem. Instead of trying to get deep into the nested structure and access bar4 directly, extract the top level of the object which will provide the long string that you've referenced.
output <- dbGetQueryForKeys(mongo, 'test_data', '{"foo": "bar"}', '{"foo1.foo2.foo3.foo4":1,"foo1":1}')
Since the output is a data.frame, you can use the 'jsonlite' library to get to your data:
library(jsonlite)
foo1 <- fromJSON(output$foo1)
bar4 <- foo1$foo2$foo3$bar4