neo4j find all nodes with matching properties

2019-02-04 11:37发布

I have a relatively large set of nodes, and I want to find all pairs of nodes that have matching property values, but I don't know or care in advance what the property value is. This is basically an attempt to find duplicate nodes, but I can limit the definition of a duplicate to two or more nodes that have the same property value.

Any ideas how to proceed? Not finding any starting points in the neo4j docs. I'm on 1.8.2 community edition.

EDIT
Sorry for not being clear in the initial question, but I'm talking about doing this through Cypher.

标签: neo4j cypher
7条回答
孤傲高冷的网名
2楼-- · 2019-02-04 12:06

You can try this one who does which I think does whatever you want.

START n=node(*), m=node(*)
WHERE 
  HAS(n.name) AND HAS (m.name) AND 
  n.name=m.name AND 
  ID(n) <ID(m) 
RETURN n, m

http://console.neo4j.org/?id=xe6wmt

Both nodes should have a name property. name should be equal for both nodes and we only want one pair of the two possibilites which we get via the id comparison. Not sure about performance - please test.

查看更多
Bombasti
3楼-- · 2019-02-04 12:07

Cypher to count values on a property, returning a collection of nodes as well:

start n=node(*)
where has(n.prop)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;

Example on console: http://console.neo4j.org/r/k2s7aa

You can also do an index scan with the property like so (to avoid looking at nodes that don't have this property):
start n=node:node_auto_index('prop:*') ...

2.0 Cypher with a label Label:

match (n:Label)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;
查看更多
Deceive 欺骗
4楼-- · 2019-02-04 12:11

The best/easiest option is to do something like a local Map. If you did something like this, you could create code like this:

GlobalGraphOperations ggo = GlobalGraphOperations.at(db);
Map<Object, Node> duplicateMap = new HashMap<Object, Node>();

for (Node node : ggo.getAllNodes()) {
    Object propertyValue = node.getProperty("property");
    Node existingNode = duplicateMap.get(propertyValue);
    if (existingNode == null) {
        duplicateMap.put(propertyValue, node);
    } else {
        System.out.println("Duplicate Node. First Node: " + existingNode + ", Second Node: " + node);
    }
}

This would print out a list. If you needed to do more, like remove these nodes, you could do something in the else.

Do you know the property name? Will this be multiple properties, or just duplicates of a single name/value pair? If you are doing multiple properties, just create a map for each property you have.

查看更多
Emotional °昔
5楼-- · 2019-02-04 12:20

What about the following approach:

  • use getAllNodes to get an Iterable over all nodes.
  • using getPropertyKeys and getProperty(key) build up a java.util.Map containing all properties for a node. Calculate the map's hashCode()
  • build up a global Map using the hashCode as key and a set of node.getId() as values

This should give you the candidates for being duplicate. Be aware of the hashCode() semantics, there might be nodes with different properties mapping to the same hashCode.

查看更多
唯我独甜
6楼-- · 2019-02-04 12:23

You can also use an index on that property. Then for a given value retrieve all the nodes. The advantage is that you can also query for approximations of the value.

查看更多
\"骚年 ilove
7楼-- · 2019-02-04 12:32

With Neo4j 3.3.4 you can simply do the following:

MATCH (n) where EXISTS(n.propertyName) return n

Simply change propertyName to whatever property you are looking to find.

查看更多
登录 后发表回答