I have a simple graph model. In this graph, Each node has an attribute {NodeId}. Each Edge will link two nodes without other attributes. It is an directed graph and has about 10^6 nodes.
Here is my situation:
I created index on attribute {NodeId} at first.Then I created 10^6 nodes. In this time, I have a graph with 10^6 nodes and no edges.
When I want to randomly add edges, I found that the speed is very slow. I can only add about 40 edges per second.
Did I miss any configurations? I don't think this is a reasonable speed.
The Code for adding edges:
public static void addAnEdge(GraphClient client, Node a, Node b)
{
client.Cypher
.Match("(node1:Node)", "(node2:Node)")
.Where((Node node1) => node1.Id == a.Id)
.AndWhere((Node node2) => node2.Id == b.Id)
.Create("node1-[:Edge]->node2")
.ExecuteWithoutResults();
}
Should I add index on edges? If so, How to do it in neo4jClient?
Thanks for your help.
Batch all my queries into one transaction is a good ieal. I execute following statement in my browser(http://localhost:7474):
MATCH (user1:Node), (user2:Node)
WHERE user1.Id >= 5000000 and user1.Id <= 5000100 and user2.Id >= 5000000 and user2.Id <= 5000100
CREATE user1-[:Edge]->user2
In this statement I create 10000 edges in one transaction. So I think the http overhead is not so serious now. The result is:
Created 10201 relationships, statement executed in 322969 ms.
That means I add 30 edges per second.
The ideal solution is to pass pairs of nodes to be related in one parameters map, then with UNWIND you can iterate those pairs and create the relationship, this is really performant as long as you have an index on the Id
property of the Node
nodes.
I don't know how you can do it with Neo4jClient, but here is the Cypher statement :
UNWIND {pairs} as pair
MATCH (a:Node), (b:Node)
WHERE a.Id = pair.start AND b.Id = pair.end
CREATE (a)-[:EDGE]->(b)
The parameters to be sent along with the query should have this form :
{
"parameters": {
"pairs": [
{
"start": "1",
"end": "2"
},
{
"start": "3",
"end": "4"
}
]
}
}
UPDATE
The Neo4jClient author kindly gave me the equivalent code in Neo4jClient :
var parameters = new [] {
new {start = 1, end = 2},
new {start = 3, end = 4}
};
client.Cypher
.Unwind(parameters, "pair")
.Match("(a:Node),(b:Node)")
.Where("a.Id = pair.start AND b.Id = pair.end")
.Create("(a)-[:EDGE]->(b)")
.ExecuteWithoutResults();
In your updated Cypher query, you MATCH
a cartesian product of all your nodes. That is very slow. Have a look at the EXPLAIN
of your query.
And see this question for an explanation how to deal with cartesian products: Why does neo4j warn: "This query builds a cartesian product between disconnected patterns"?
Do you have an index on the Id
property? Ideally, you should use a uniqueness constraint. This automatically adds a very fast index.
In your query, try to first MATCH
the first nodes, use WITH
to collect them in a list and then MATCH
the second batch of nodes:
MATCH (user1:Node)
WHERE user1.id >= 50000 and user1.id <= 50100
WITH collect(user1) as list1
MATCH (user2:Node)
WHERE user2.id >= 50000 and user2.id <= 50100
UNWIND list1 as user1
CREATE (user1)-[:EDGE]->(user2)