Traverse graph database from random seed nodes

2019-08-31 00:01发布

I am tasked with writing a query for a front-end application that visualizes a Neptune Graph database. Let us say that the first vertex are items while the second vertex user. A user can create an item. There are item to item relationships to show items derived from another item like in the case of media clips cut out of an original media clip. The first set of items created should be created in a vertex such as a SERVER which they are grouped by in the UI.

The following is the requirement:

    Find (Y) seed nodes that are not connected by any ITEM-ITEM relationships on the graph (relationships via USERs etc... are fine)
    Populate the graph with all relationships from these (Y) seed nodes with no limits on the relationships that are followed (relationships through USERs for example is fine).
    Stop populating the graph once the number of nodes (not records limit) hits the limit specified by (X)

Here is a visual representation of the graph.

https://drive.google.com/file/d/1YNzh4wbzcdC0JeloMgD2C0oS6MYvfI4q/view?usp=sharing

A sample code to reproduce this graph is below. This graph could even get deeper. This is a just a simple example. Kindly see diagram:

g.addV('SERVER').property(id, 'server1')
g.addV('SERVER').property(id, 'server2')
g.addV('ITEM').property(id, 'item1')
g.addV('ITEM').property(id, 'item2')
g.addV('ITEM').property(id, 'item3')
g.addV('ITEM').property(id, 'item4')
g.addV('ITEM').property(id, 'item5')
g.addV('USER').property(id, 'user1')


g.V('item1').addE('STORED IN').to(g.V('server1'))
g.V('item2').addE('STORED IN').to(g.V('server2'))
g.V('item2').addE('RELATED TO').to(g.V('item1'))
g.V('item3').addE('DERIVED FROM').to(g.V('item2') )
g.V('item3').addE('CREATED BY').to(g.V('user1'))
g.V('user1').addE('CREATED').to(g.V('item4'))
g.V('item4').addE('RELATED TO').to(g.V('item5'))

The result should be in the form below if possible:

[
 [
   {
     "V1": {},
     "E": {},
     "V2": {}
   }
 ]
]

We have an API with an endpoint that allows for open-ended gremlin queries. We call this endpoint in our client app to fetch the data that is rendered visually. I have written a query that I do not think is quite right. Moreover, I would like to know how to filter the number of nodes traversed and stop at X nodes.

g.V().hasLabel('USER','SERVER').sample(5).aggregate('v1').repeat(__.as('V1').bothE().dedup().as('E').otherV().hasLabel('USER','SERVER').as('V2').aggregate('x').by(select('V1', 'E', 'V2'))).until(out().count().is(0)).as('V1').bothE().dedup().as('E').otherV().hasLabel(without('ITEM')).as('V2').aggregate('x').by(select('V1', 'E', 'V2')).cap('v1','x','v1').coalesce(select('x').unfold(),select('v1').unfold().project('V1'))

I would appreciate if I can get a single query that will fetch this dataset if it is possible. If vertices in the result are not connected to anything, I would want to retrieve them and render them like that on the UI.

1条回答
欢心
2楼-- · 2019-08-31 00:35

I have looked at this again and came up with this query

g.V().hasLabel(without('ITEM')).sample(2).aggregate('v1').
  repeat(__.as('V1').bothE().dedup().as('E').otherV().as('V2').
      aggregate('x').by(select('V1', 'E', 'V2'))).
    until(out().count().is(0)).
  as('V1').bothE().dedup().as('E').otherV().as('V2').
  aggregate('x').
    by(select('V1', 'E', 'V2')).
  cap('v1','x','v1').
  coalesce(select('x').unfold(),select('v1').unfold().project('V1')).limit(5)

To meet the criteria for the node count rather than records count (or limit), I can pass to limit half the number passed in by the user as an input for nodes count and then exclude the edge E and vertice V2 of the last record from what will be rendered on the UI.

I will approach any suggestions on a better way.

查看更多
登录 后发表回答