I'm in progress with estimation of MongoDB for our customers. Per requirements we need associate with some entity ent
variable set of name-value pairs.
db.ent.insert({'a':5775, 'b':'b1'})
db.ent.insert({'c':'its a c', 'b':'b2'})
db.ent.insert({'a':7557, 'c':'its a c'})
After this I need intensively query ent
for presence of fields:
db.ent.find({'a':{$exists:true}})
db.ent.find({'c':{$exists:false}})
Per MongoDB docs:
$exists is not very efficient even with an index, and esp. with {$exists:true} since it will effectively have to scan all indexed values.
Can experts there provide more efficient way (even with shift the paradigm) to deal fast with vary name-value pairs
You can redesign your schema like this:
Then you indexing your key:
After this you will able to search by exact match:
In case you go with Sparse index and your current schema, proposed by @WesFreeman, you will need to create an index on each key you want to search. It can affect write performance or will be not acceptable if your keys are not static.
I think a sparse index is the answer to this, although you'll need an index for each field. http://www.mongodb.org/display/DOCS/Indexes#Indexes-SparseIndexes
Sparse indexes should help with $exists:true queries.
Even still, if your field is not really sparse (meaning it's mostly set), it's not going to help you that much.
Update I guess I'm wrong. Looks like there's an open issue ( https://jira.mongodb.org/browse/SERVER-4187 ) still that $exists doesn't use sparse indexes. However, you can do something like this with find and sort, which looks like it properly uses the sparse index:
Here's a full demonstration of the difference, using your example values:
Note that
find({}).sort({a:1})
uses the index (BtreeCursor):And
find({a:{$exists:true}})
does a full scan:Looks like you can also use .hint({a:1}) to force it to use the index.
How about setting the non-exists field to
null
? Then you can query them with{field: {$ne: null}}
.Here's the output:
Simply redesign your schema such that it's an indexable query. Your use case is infact analogous to the first example application given in MongoDB The Definitive Guide.
If you want/need the convenience of
result.a
just store the keys somewhere indexable.instead of the existing:
do
That's then an indexable query:
or if you're ever likely to query also by value:
That's also an indexable query: