I'm trying to write the most optimal query to find all of the documents that do not have a specific field. Is there any better way to do this than the examples I have listed below?
// Get the ids of all documents missing "location"
r.db("mydb").table("mytable").filter({location: null},{default: true}).pluck("id")
// Get a count of all documents missing "location"
r.db("mydb").table("mytable").filter({location: null},{default: true}).count()
Right now, these queries take about 300-400ms on a table with ~40k documents, which seems rather slow. Furthermore, in this specific case, the "location" attribute contains latitude/longitude and has a geospatial index.
Is there any way to accomplish this? Thanks!
A naive suggestion
You could use the hasFields
method along with the not
method on to filter out unwanted documents:
r.db("mydb").table("mytable")
.filter(function (row) {
return row.hasFields({ location: true }).not()
})
This might or might not be faster, but it's worth trying.
Using a secondary index
Ideally, you'd want a way to make location
a secondary index and then use getAll
or between
since queries using indexes are always faster. A way you could work around that is making all rows in your table have a value false
value for their location, if they don't have a location. Then, you would create a secondary index for location. Finally, you can then query the table using getAll
as much as you want!
- Adding a location property to all fields without a location
For that, you'd need to first insert location: false
into all rows without a location. You could do this as follows:
r.db("mydb").table("mytable")
.filter(function (row) {
return row.hasFields({ location: true }).not()
})
.update({
location: false
})
After this, you would need to find a way to insert location: false
every time you add a document without a location.
- Create secondary index for the table
Now that all documents have a location
field, we can create a secondary index for location
.
r.db("mydb").table("mytable")
.indexCreate('location')
Keep in mind that you only have to add the { location: false }
and create the index only once.
- Use
getAll
Now we can just use getAll
to query documents using the location
index.
r.db("mydb").table("mytable")
.getAll(false, { index: 'location' })
This will probably be faster than the query above.
Using a secondary index (function)
You can also create a secondary index as a function. Basically, you create a function and then query the results of that function using getAll
. This is probably easier and more straight-forward than what I proposed before.
- Create the index
Here it is:
r.db("mydb").table("mytable")
.indexCreate('has_location',
function(x) { return x.hasFields('location');
})
- Use
getAll
.
Here it is:
r.db("mydb").table("mytable")
.getAll(false, { index: 'has_location' })