NOTE
I let this become several questions instead of the simple one I asked, so I am breaking the follow-ups off into their own question here.
ORIGINAL QUESTION
I'm receiving a list of IDs that I am first testing whether any of them are in my graph, and if they /are/ I am processing those nodes further.
So, for example...
fids = get_fids(record) # [100001, 100002, 100003, ... etc]
ids_in_my_graph = filter(id_is_in_graph, fids) # [100002]
def id_is_in_graph(id):
val = False
query = """MATCH (user:User {{id_str:"{}"}})
RETURN user
""".format(id)
n=neo4j.CypherQuery(graph_db,query).execute_one()
if n:
val = True
return(val)
As you can imagine, doing this with filter, sequentially testing whether each ID is in my graph is really, really slow, and is clearly not properly using neo4j.
How would I rephrase my query such that I could create a list like (User{id_str: [mylist]})
to query and return only IDs that are in my graph?
You may want to use WHERE...IN by exploiting the collection functionality of cypher. Here's the relevant reference
So your query might look like this:
MATCH (user:User)
WHERE user.id_str IN ["100001", "100002", "100003"]
return user;
Now, I don't know how large a collection can be. I doubt this would work if your collection had 1,000 items in it. But at least this is a way of batching them up into chunks. This should improve performance.
Also have a look at the Collections section of the Cypher 2.0 refcard
You should use cypher with parameters, like {id} and then pass "id"-> record.id
to the execution
MATCH (user:User {id_str:{user_id}}),(friend:User {id_str:{friend_id}})
CREATE UNIQUE (user)-[:FRIENDS]->(friend)
{ "user_id" : record.id, "friend_id" : i}
Make sure to add a
create unique constraint on (u:User) assert u.id is unique;
And you can send multiple statements at once to the transactional http endpoint for cypher:
http://docs.neo4j.org/chunked/milestone/rest-api-transactional.html
Which is probably already supported by your driver.