可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a relatively large set of nodes, and I want to find all pairs of nodes that have matching property values, but I don't know or care in advance what the property value is. This is basically an attempt to find duplicate nodes, but I can limit the definition of a duplicate to two or more nodes that have the same property value.
Any ideas how to proceed? Not finding any starting points in the neo4j docs. I'm on 1.8.2 community edition.
EDIT
Sorry for not being clear in the initial question, but I'm talking about doing this through Cypher.
回答1:
Cypher to count values on a property, returning a collection of nodes as well:
start n=node(*)
where has(n.prop)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;
Example on console: http://console.neo4j.org/r/k2s7aa
You can also do an index scan with the property like so (to avoid looking at nodes that don't have this property):
start n=node:node_auto_index('prop:*') ...
2.0 Cypher with a label Label:
match (n:Label)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;
回答2:
What about the following approach:
- use getAllNodes to get an Iterable over all nodes.
- using getPropertyKeys and getProperty(key) build up a
java.util.Map
containing all properties for a node. Calculate the map's hashCode()
- build up a global
Map
using the hashCode as key and a set of node.getId()
as values
This should give you the candidates for being duplicate. Be aware of the hashCode() semantics, there might be nodes with different properties mapping to the same hashCode.
回答3:
You can try this one who does which I think does whatever you want.
START n=node(*), m=node(*)
WHERE
HAS(n.name) AND HAS (m.name) AND
n.name=m.name AND
ID(n) <ID(m)
RETURN n, m
http://console.neo4j.org/?id=xe6wmt
Both nodes should have a name
property. name
should be equal for both nodes and we only want one pair of the two possibilites which we get via the id comparison. Not sure about performance - please test.
回答4:
Neo4j 3.1.1
HAS is no longer supported in Cypher, please use EXISTS instead.
If you want to find nodes with specific property, the Cyper is as follows:
MATCH (n:NodeLabel) where has(n.NodeProperty) return n
回答5:
With Neo4j 3.3.4 you can simply do the following:
MATCH (n) where EXISTS(n.propertyName) return n
Simply change propertyName
to whatever property you are looking to find.
回答6:
The best/easiest option is to do something like a local Map
. If you did something like this, you could create code like this:
GlobalGraphOperations ggo = GlobalGraphOperations.at(db);
Map<Object, Node> duplicateMap = new HashMap<Object, Node>();
for (Node node : ggo.getAllNodes()) {
Object propertyValue = node.getProperty("property");
Node existingNode = duplicateMap.get(propertyValue);
if (existingNode == null) {
duplicateMap.put(propertyValue, node);
} else {
System.out.println("Duplicate Node. First Node: " + existingNode + ", Second Node: " + node);
}
}
This would print out a list. If you needed to do more, like remove these nodes, you could do something in the else.
Do you know the property name? Will this be multiple properties, or just duplicates of a single name/value pair? If you are doing multiple properties, just create a map for each property you have.
回答7:
You can also use an index on that property. Then for a given value retrieve all the nodes. The advantage is that you can also query for approximations of the value.