The Neo4j graph database holds roughly 50,000 nodes and > 50,000 relationships. There is a main graph that contains most nodes. But there are several graphs that are not (yet) connected to the main graph.
In order to connect the various graph to form one big main graph I intend to use a Cypher query to list paths or collections of connected nodes ordered by their size (biggest disconnected graph first).
There are several posts on stackoverflow like:
- Finding all disconnected subgraphs in a graph but it's not obvious how to solve it with Cypher.
- How do I find disconnected nodes on neo4j with Cypher?
Here is a small example graph that represents the problem: Neo4j Console example graph
The following Cypher query does not solve the problem but is a starting point. It lists all those nodes that are not connceted to the main graph. It misses the combining of those nodes into collections of nodes. It works on a small graph. On a large graph it only returns "undefined" ... after running more than 10 minutes.
START s=node(3), n=node(*)
MATCH s-[*1..10]-m
WITH collect(m) as members, n
WHERE NOT n in members
RETURN DISTINCT id(n), n.name?
ORDER BY id(n)
LIMIT 10;
How to use Cypher to list all disconnected (sub-) graphs?
Environment: - Neo4j - Graph Database Kernel 1.9.M05 - Java - SE Runtime Environment (build 1.7.0_17-b02)
this is not a complete answer but I think you should (if you can) fall back the Traversal Framework for this use case.
Cypher is about matching specific portions of a graph, no matter how you wanna do it. The Traversal framework is really about HOW you wanna traverse a graph.
In your case, the traversal is more important than the graph to match. Here is what I'd suggest, use the Traversal Framework to