I am using Rethinkdb 1.10.1 with the official python driver. I have a table of tagged things which are associated to one user:
{
"id": "PK",
"user_id": "USER_PK",
"tags": ["list", "of", "strings"],
// Other fields...
}
I want to query by user_id
and tag
(say, to find all the things by user "tawmas" with tag "tag"). Starting with Rethinkdb 1.10 I can create a multi-index like this:
r.table('things').index_create('tags', multi=True).run(conn)
My query would then be:
res = (r.table('things')
.get_all('TAG', index='tags')
.filter(r.row['user_id'] == 'USER_PK').run(conn))
However, this query still needs to scan all the documents with the given tag, so I would like to create a compound index based on the user_id and tags fields. Such an index would allow me to query with:
res = r.table('things').get_all(['USER_PK', 'TAG'], index='user_tags').run(conn)
There is nothing in the documentation about compound multi-indexes. However, I
tried to use a custom index function combining the requirements for compound
indexes and multi-indexes by returning a list of ["USER_PK", "tag"]
pairs.
My first attempt was in python:
r.table('things').index_create(
'user_tags',
lambda each: [[each['user_id'], tag] for tag in each['tags']],
multi=True).run(conn)
This makes the python driver choke with a MemoryError
trying to parse the index function (I guess list comprehensions aren't really supported by the driver).
So, I turned to my (admittedly, rusty) javascript and came up with this:
r.table('things').index_create(
'user_tags',
r.js(
"""(function (each) {
var result = [];
var user_id = each["user_id"];
var tags = each["tags"];
for (var i = 0; i < tags.length; i++) {
result.push([user_id, tags[i]]);
}
return result;
})
"""),
multi=True).run(conn)
This is rejected by the server with a curious exception: rethinkdb.errors.RqlRuntimeError: Could not prove function deterministic. Index functions must be deterministic.
So, what is the correct way to define a compound multi-index? Or is it something which is not supported at this time?
Short answer:
List comprehensions don't work in ReQL functions. You need to use
map
instead like so:Long answer
This is actually a somewhat subtle aspect of how RethinkDB drivers work. So the reason this doesn't work is that your python code doesn't actually see real copies of the each document. So in the expression:
each
isn't ever bound to an actual document from your database, it's bound to a special python variable which represents the document. I'd actually try running the following just to demonstrate it:And it will print out something like:
the driver only knows that this is a variable from the function, in particular it has no idea if
each["tags"]
is an array or what (it's actually just another very similar abstract object). So python doesn't know how to iterate over that field. Basically exactly the same problem exists in javascript.