How to get a subgraph consisting of all vertices,

2019-04-13 22:44发布

The Document and the Revision are two objects that reside in our domain logic specific layer.

The Document represents an abstraction around any material piece of paper that you could think of. That is - every contract, invoice or drawing could be called a Document.

On the other hand, the material representation of the Document is the Revision: the list of paper, that construction engineer receives on site, represents a Revision of the Document that designer has created. If something in a drawing has to be changed, due to an error or changed requirements, then a new revision will show up on site - Revision #2 of the same Document.

The Revision could contains links to other Documents; thus we could describe the relations between a car, its doors, engine, wheels and so on, and the possibility every element to evolve independently, while stays attached to other elements.

A typical DAG is displayed:

Car elements - Documents and Revisions

I managed to inset all vertices and edges into CosmosDB using C# Graph API. I managed to traverse the graph and execute simple queries in order to find how many revisions the car has, or if the engine has a turbocharger when it was initially created. However, I'm struggling with composing a complex query, that returns only the more recent revisions of every part or the car, or a query that returns the state of the car up to 2016-08-10.

The state of the car up to 2017-01-03: Finished car

The state of the car up to 2016-08-10: Car's engine has no turbocharger yet

When the traversal visits the descendants of an vertex (its "out()"), I couldn't find a way to get the most recently created and to continue traversing without digging into the others. I will be grateful if you suggest me an expression, that returns only colored vertices from the pictures.

1条回答
Ridiculous、
2楼-- · 2019-04-13 23:34

While pictures are helpful, when asking questions about Gremlin it is helpful to always provide a Gremlin script that can generate a sample of your graph. For example, for your questions:

graph = TinkerGraph.open()
g = graph.traversal()
g.addV('car').property('name','car').as('car').
  addV('rev').property('name','car revision 1').property('date', 1470787200L).as('carV1').
  addV('rev').property('name','car revision 2').property('date', 1472688000L).as('carV2').
  addV('frontLeftDoor').property('name','front left door').as('frontLeftDoor').
  addV('frontRightDoor').property('name','front right door').as('frontRightDoor').
  addV('engine').property('name','engine').as('engine').
  addV('turbocharger').property('name','turbocharger').as('turbocharger').
  addV('rev').property('name','front left door revision 1').property('date',1470787200L).as('frontLeftDoorV1').
  addV('rev').property('name','front left door revision 2').property('date',1472688000L).as('frontLeftDoorV2').
  addV('rev').property('name','front right door revision 1').property('date',1470787200L).as('frontRightDoorV1').
  addV('rev').property('name','engine revision 1').property('date',1470787200L).as('engineV1').
  addV('rev').property('name','engine revision 2').property('date',1472688000L).as('engineV2'). 
  addV('rev').property('name','engine revision 3').property('date',1483401600L).as('engineV3').
  addV('rev').property('name','turbocharger revision 1').property('date',1470787200L).as('turbochargerV1'). 
  addV('rev').property('name','turbocharger revision 2').property('date',1472688000L).as('turbochargerV2'). 
  addE('relates').from('car').to('carV1').
  addE('relates').from('car').to('carV2').
  addE('relates').from('carV1').to('frontLeftDoor').
  addE('relates').from('carV1').to('frontRightDoor').
  addE('relates').from('carV1').to('engine').
  addE('relates').from('carV2').to('frontLeftDoor').
  addE('relates').from('carV2').to('frontRightDoor').
  addE('relates').from('carV2').to('engine').
  addE('relates').from('frontLeftDoor').to('frontLeftDoorV1').
  addE('relates').from('frontLeftDoor').to('frontLeftDoorV2').
  addE('relates').from('frontRightDoor').to('frontRightDoorV1').
  addE('relates').from('engine').to('engineV1').
  addE('relates').from('engine').to('engineV2').
  addE('relates').from('engine').to('engineV3').
  addE('relates').from('engineV2').to('turbocharger').
  addE('relates').from('engineV3').to('turbocharger').
  addE('relates').from('turbocharger').to('turbochargerV1').
  addE('relates').from('turbocharger').to('turbochargerV2').iterate()

It often takes the person answering the question more time to create a sample graph for the question than it does to develop the Gremlin that provides the answer.

Anyway, here is one way to do this using "8/10/2016" as the "start date":

gremlin> g.V().has('name','car').
......1>   repeat(local(out().has('date',lte(1470787200L)).
......2>                order().
......3>                  by('date',decr).limit(1)).
......4>          out()).
......5>     emit().
......6>   local(out().has('date',lte(1470787200L)).
......7>         order().
......8>           by('date',decr).limit(1)).
......9>   tree().by('name')
==>[car:[car revision 1:[front right door:[front right door revision 1:[]],engine:[engine revision 1:[]],front left door:[front left door revision 1:[]]]]]

Here's the same traversal with a different date - "1/1/2017":

gremlin> g.V().has('name','car').
......1>   repeat(local(out().has('date',lte(1483228800L)).
......2>                order().
......3>                  by('date',decr).limit(1)).
......4>          out()).
......5>     emit().
......6>   local(out().has('date',lte(1483228800L)).
......7>         order().
......8>           by('date',decr).limit(1)).
......9>   tree().by('name')
==>[car:[car revision 2:[front right door:[front right door revision 1:[]],engine:[engine revision 2:[turbocharger:[turbocharger revision 2:[]]]],front left door:[front left door revision 2:[]]]]]

In this case, see that "engine revision 3" is excluded as it's the only vertex after "1/1/2017" - the rest of the tree is present.

A few notes:

  1. I converted your dates to longs for easier comparison. I'm not sure if CosmosDB has nice handling for dates with respect to the lte predicate for has() but if it does you would probably prefer going that route.
  2. The repeat() step allows for arbitrary depth traversal in the tree, but note the duplicated logic it contains just outside of that after emit() - this grabs the final "leaves of the tree" as inside the repeat() the loop ends because there are no more outE() to traverse.
  3. The logic within the repeat() looks a bit complex, but it's basically just saying for the current "document" traverse to all the "revisions", sort on the date in descending order and grab the first one. Once it has the most recent revision as controlled by the date you care about, traverse out to any other documents it is connected to.
  4. I used tree() step in this case as CosmosDB seems to support that. It does not look as though they yet support subgraph(). That step technically isn't even supported in the Apache TinkerPop C# Gremlin Language Variant - there are some challenges there that leave that a Java only features unfortunately. Luckily, the shape of your data is tree-like so tree() steps seems sufficient.

In Groovy, you could supply the repeated logic by way of a closure to make things a bit more re-usable:

gremlin> traverseAndFilter = { out().has('date',lte(1470787200L)).
......1>                       order().
......2>                         by('date',decr).limit(1) }
==>groovysh_evaluate$_run_closure1@1d12e953
gremlin> g.V().has('name','car').
......1>   repeat(local(traverseAndFilter()).out()).
......2>     emit().
......3>   local(local(traverseAndFilter())).
......4>   tree().by('name')
==>[car:[car revision 1:[front right door:[front right door revision 1:[]],engine:[engine revision 1:[]],front left door:[front left door revision 1:[]]]]]

or store the 'traverseAndFilter" traversal itself and clone() it:

gremlin> traverseAndFilter = out().has('date',lte(1470787200L)).
......1>                       order().
......2>                         by('date',decr).limit(1);[] 
gremlin> g.V().has('name','car').
......1>   repeat(local(traverseAndFilter.clone()).out()).
......2>     emit().
......3>   local(local(traverseAndFilter.clone())).
......4>   tree().by('name')
==>[car:[car revision 1:[front right door:[front right door revision 1:[]],engine:[engine revision 1:[]],front left door:[front left door revision 1:[]]]]]
查看更多
登录 后发表回答