I have a relatively large set of nodes, and I want to find all pairs of nodes that have matching property values, but I don't know or care in advance what the property value is. This is basically an attempt to find duplicate nodes, but I can limit the definition of a duplicate to two or more nodes that have the same property value.
Any ideas how to proceed? Not finding any starting points in the neo4j docs. I'm on 1.8.2 community edition.
EDIT
Sorry for not being clear in the initial question, but I'm talking about doing this through Cypher.
You can try this one who does which I think does whatever you want.
http://console.neo4j.org/?id=xe6wmt
Both nodes should have a
name
property.name
should be equal for both nodes and we only want one pair of the two possibilites which we get via the id comparison. Not sure about performance - please test.Cypher to count values on a property, returning a collection of nodes as well:
Example on console: http://console.neo4j.org/r/k2s7aa
You can also do an index scan with the property like so (to avoid looking at nodes that don't have this property):
start n=node:node_auto_index('prop:*') ...
2.0 Cypher with a label Label:
The best/easiest option is to do something like a local
Map
. If you did something like this, you could create code like this:This would print out a list. If you needed to do more, like remove these nodes, you could do something in the else.
Do you know the property name? Will this be multiple properties, or just duplicates of a single name/value pair? If you are doing multiple properties, just create a map for each property you have.
What about the following approach:
java.util.Map
containing all properties for a node. Calculate the map'shashCode()
Map
using the hashCode as key and a set ofnode.getId()
as valuesThis should give you the candidates for being duplicate. Be aware of the hashCode() semantics, there might be nodes with different properties mapping to the same hashCode.
You can also use an index on that property. Then for a given value retrieve all the nodes. The advantage is that you can also query for approximations of the value.
With Neo4j 3.3.4 you can simply do the following:
MATCH (n) where EXISTS(n.propertyName) return n
Simply change
propertyName
to whatever property you are looking to find.