I'm trying to write the most optimal query to find all of the documents that do not have a specific field. Is there any better way to do this than the examples I have listed below?
// Get the ids of all documents missing "location"
r.db("mydb").table("mytable").filter({location: null},{default: true}).pluck("id")
// Get a count of all documents missing "location"
r.db("mydb").table("mytable").filter({location: null},{default: true}).count()
Right now, these queries take about 300-400ms on a table with ~40k documents, which seems rather slow. Furthermore, in this specific case, the "location" attribute contains latitude/longitude and has a geospatial index.
Is there any way to accomplish this? Thanks!
A naive suggestion
You could use the
hasFields
method along with thenot
method on to filter out unwanted documents:This might or might not be faster, but it's worth trying.
Using a secondary index
Ideally, you'd want a way to make
location
a secondary index and then usegetAll
orbetween
since queries using indexes are always faster. A way you could work around that is making all rows in your table have a valuefalse
value for their location, if they don't have a location. Then, you would create a secondary index for location. Finally, you can then query the table usinggetAll
as much as you want!For that, you'd need to first insert
location: false
into all rows without a location. You could do this as follows:After this, you would need to find a way to insert
location: false
every time you add a document without a location.Now that all documents have a
location
field, we can create a secondary index forlocation
.Keep in mind that you only have to add the
{ location: false }
and create the index only once.getAll
Now we can just use
getAll
to query documents using thelocation
index.This will probably be faster than the query above.
Using a secondary index (function)
You can also create a secondary index as a function. Basically, you create a function and then query the results of that function using
getAll
. This is probably easier and more straight-forward than what I proposed before.Here it is:
getAll
.Here it is: