Traverse graph database from random seed nodes

I am tasked with writing a query for a front-end application that visualizes a Neptune Graph database. Let us say that the first vertex are items while the second vertex user. A user can create an item. There are item to item relationships to show items derived from another item like in the case of media clips cut out of an original media clip. The first set of items created should be created in a vertex such as a SERVER which they are grouped by in the UI.

The following is the requirement:

    Find (Y) seed nodes that are not connected by any ITEM-ITEM relationships on the graph (relationships via USERs etc... are fine)
    Populate the graph with all relationships from these (Y) seed nodes with no limits on the relationships that are followed (relationships through USERs for example is fine).
    Stop populating the graph once the number of nodes (not records limit) hits the limit specified by (X)

Here is a visual representation of the graph.

https://drive.google.com/file/d/1YNzh4wbzcdC0JeloMgD2C0oS6MYvfI4q/view?usp=sharing

A sample code to reproduce this graph is below. This graph could even get deeper. This is a just a simple example. Kindly see diagram:

g.addV('SERVER').property(id, 'server1')
g.addV('SERVER').property(id, 'server2')
g.addV('ITEM').property(id, 'item1')
g.addV('ITEM').property(id, 'item2')
g.addV('ITEM').property(id, 'item3')
g.addV('ITEM').property(id, 'item4')
g.addV('ITEM').property(id, 'item5')
g.addV('USER').property(id, 'user1')


g.V('item1').addE('STORED IN').to(g.V('server1'))
g.V('item2').addE('STORED IN').to(g.V('server2'))
g.V('item2').addE('RELATED TO').to(g.V('item1'))
g.V('item3').addE('DERIVED FROM').to(g.V('item2') )
g.V('item3').addE('CREATED BY').to(g.V('user1'))
g.V('user1').addE('CREATED').to(g.V('item4'))
g.V('item4').addE('RELATED TO').to(g.V('item5'))

The result should be in the form below if possible:

[
 [
   {
     "V1": {},
     "E": {},
     "V2": {}
   }
 ]
]

We have an API with an endpoint that allows for open-ended gremlin queries. We call this endpoint in our client app to fetch the data that is rendered visually. I have written a query that I do not think is quite right. Moreover, I would like to know how to filter the number of nodes traversed and stop at X nodes.

g.V().hasLabel('USER','SERVER').sample(5).aggregate('v1').repeat(__.as('V1').bothE().dedup().as('E').otherV().hasLabel('USER','SERVER').as('V2').aggregate('x').by(select('V1', 'E', 'V2'))).until(out().count().is(0)).as('V1').bothE().dedup().as('E').otherV().hasLabel(without('ITEM')).as('V2').aggregate('x').by(select('V1', 'E', 'V2')).cap('v1','x','v1').coalesce(select('x').unfold(),select('v1').unfold().project('V1'))

I would appreciate if I can get a single query that will fetch this dataset if it is possible. If vertices in the result are not connected to anything, I would want to retrieve them and render them like that on the UI.

标签： graph-databases gremlin amazon-neptune

1条回答

欢心

2楼-- · 2019-08-31 00:35

I have looked at this again and came up with this query

g.V().hasLabel(without('ITEM')).sample(2).aggregate('v1').
  repeat(__.as('V1').bothE().dedup().as('E').otherV().as('V2').
      aggregate('x').by(select('V1', 'E', 'V2'))).
    until(out().count().is(0)).
  as('V1').bothE().dedup().as('E').otherV().as('V2').
  aggregate('x').
    by(select('V1', 'E', 'V2')).
  cap('v1','x','v1').
  coalesce(select('x').unfold(),select('v1').unfold().project('V1')).limit(5)

To meet the criteria for the node count rather than records count (or limit), I can pass to limit half the number passed in by the user as an input for nodes count and then exclude the edge E and vertice V2 of the last record from what will be rendered on the UI.

I will approach any suggestions on a better way.

0人赞添加讨论(0) 举报

Traverse graph database from random seed nodes

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间