How to add edges(relationship) Neo4j in a very big

2019-08-08 11:58发布

I have a simple graph model. In this graph, Each node has an attribute {NodeId}. Each Edge will link two nodes without other attributes. It is an directed graph and has about 10^6 nodes.

Here is my situation: I created index on attribute {NodeId} at first.Then I created 10^6 nodes. In this time, I have a graph with 10^6 nodes and no edges. When I want to randomly add edges, I found that the speed is very slow. I can only add about 40 edges per second.

Did I miss any configurations? I don't think this is a reasonable speed.

The Code for adding edges:

public static void addAnEdge(GraphClient client, Node a, Node b)
    {
        client.Cypher
        .Match("(node1:Node)", "(node2:Node)")
        .Where((Node node1) => node1.Id == a.Id)
        .AndWhere((Node node2) => node2.Id == b.Id)
        .Create("node1-[:Edge]->node2")
        .ExecuteWithoutResults();
    }

Should I add index on edges? If so, How to do it in neo4jClient? Thanks for your help.


Batch all my queries into one transaction is a good ieal. I execute following statement in my browser(http://localhost:7474):

MATCH (user1:Node), (user2:Node)
WHERE user1.Id >= 5000000 and user1.Id <= 5000100 and user2.Id >= 5000000 and user2.Id <= 5000100
CREATE user1-[:Edge]->user2

In this statement I create 10000 edges in one transaction. So I think the http overhead is not so serious now. The result is:

Created 10201 relationships, statement executed in 322969 ms.

That means I add 30 edges per second.

2条回答
我命由我不由天
2楼-- · 2019-08-08 12:04

The ideal solution is to pass pairs of nodes to be related in one parameters map, then with UNWIND you can iterate those pairs and create the relationship, this is really performant as long as you have an index on the Id property of the Node nodes.

I don't know how you can do it with Neo4jClient, but here is the Cypher statement :

UNWIND {pairs} as pair
MATCH (a:Node), (b:Node)
WHERE a.Id = pair.start AND b.Id = pair.end
CREATE (a)-[:EDGE]->(b)

The parameters to be sent along with the query should have this form :

{
  "parameters": {
    "pairs": [
      {
        "start": "1",
        "end": "2"
      },
      {
        "start": "3",
        "end": "4"
      }
    ]
  }
}

UPDATE

The Neo4jClient author kindly gave me the equivalent code in Neo4jClient :

var parameters = new [] {
       new {start = 1, end = 2},
       new {start = 3, end = 4}
   };

   client.Cypher
       .Unwind(parameters, "pair")
       .Match("(a:Node),(b:Node)")
       .Where("a.Id = pair.start AND b.Id = pair.end")
       .Create("(a)-[:EDGE]->(b)")
       .ExecuteWithoutResults();
查看更多
该账号已被封号
3楼-- · 2019-08-08 12:24

In your updated Cypher query, you MATCH a cartesian product of all your nodes. That is very slow. Have a look at the EXPLAIN of your query.

And see this question for an explanation how to deal with cartesian products: Why does neo4j warn: "This query builds a cartesian product between disconnected patterns"?

Do you have an index on the Id property? Ideally, you should use a uniqueness constraint. This automatically adds a very fast index.

In your query, try to first MATCH the first nodes, use WITH to collect them in a list and then MATCH the second batch of nodes:

MATCH (user1:Node)
WHERE user1.id >= 50000 and user1.id <= 50100
WITH collect(user1) as list1
MATCH (user2:Node)
WHERE user2.id >= 50000 and user2.id <= 50100
UNWIND list1 as user1
CREATE (user1)-[:EDGE]->(user2)
查看更多
登录 后发表回答