Cypher query to list all disconnected graphs Neo4j

2019-04-10 10:31发布

The Neo4j graph database holds roughly 50,000 nodes and > 50,000 relationships. There is a main graph that contains most nodes. But there are several graphs that are not (yet) connected to the main graph.

In order to connect the various graph to form one big main graph I intend to use a Cypher query to list paths or collections of connected nodes ordered by their size (biggest disconnected graph first).

There are several posts on stackoverflow like:

Here is a small example graph that represents the problem: Neo4j Console example graph

The following Cypher query does not solve the problem but is a starting point. It lists all those nodes that are not connceted to the main graph. It misses the combining of those nodes into collections of nodes. It works on a small graph. On a large graph it only returns "undefined" ... after running more than 10 minutes.

START s=node(3), n=node(*) 
MATCH s-[*1..10]-m 
WITH collect(m) as members, n 
WHERE NOT n in members 
RETURN DISTINCT id(n), n.name? 
ORDER BY id(n) 
LIMIT 10;

How to use Cypher to list all disconnected (sub-) graphs?

Environment: - Neo4j - Graph Database Kernel 1.9.M05 - Java - SE Runtime Environment (build 1.7.0_17-b02)

标签: neo4j cypher
1条回答
乱世女痞
2楼-- · 2019-04-10 10:45

this is not a complete answer but I think you should (if you can) fall back the Traversal Framework for this use case.

Cypher is about matching specific portions of a graph, no matter how you wanna do it. The Traversal framework is really about HOW you wanna traverse a graph.

In your case, the traversal is more important than the graph to match. Here is what I'd suggest, use the Traversal Framework to

  1. label groups of nodes in the way you want
  2. aggregate the results in a map (or something more evolved) while you're at it
查看更多
登录 后发表回答