Iterating a GraphTraversal with GraphFrame causes

2019-07-29 06:45发布

问题:

The following

    GraphTraversal<Row, Edge> traversal = gf().E().hasLabel("foo").limit(5);
    while (traversal.hasNext()) {}

causes the following Exception:

java.lang.UnsupportedOperationException: Row to Vertex conversion is not supported: Use .df().collect() instead of the iterator

    at com.datastax.bdp.graph.spark.graphframe.DseGraphTraversal.iterator$lzycompute(DseGraphTraversal.scala:92)
    at com.datastax.bdp.graph.spark.graphframe.DseGraphTraversal.iterator(DseGraphTraversal.scala:78)
    at com.datastax.bdp.graph.spark.graphframe.DseGraphTraversal.hasNext(DseGraphTraversal.scala:129)

Exception says to use .df().collect() but gf().E().hasLabel("foo") does not allow you to do .df() afterwards. In other words, method df() is not there for object returned by hasLabel()

I'm using the Java API via dse-graph-frames:5.1.4 along with dse-byos_2.11:5.1.4.

回答1:

The short answer: You need to cast GraphTraversal to DseGraphTraversal that has df() method. Then use one of spark Dataset methods to collect Rows:

List<Row> rows =
   ((DseGraphTraversal)graph.E().hasLabel("foo"))
   .df().limit(5).collectAsList();

DseGraphFrame does not yet support full TinkerPop specification. So you can not receive TinkerPop Vertex or Edge objects. ( limit() method is also not implemented in DSE 5.1.x). It is recommended to switch to spark dataset api with df() call, get Dataset<Row> and use Dataset base filtering and collecting

If you need only Edge/Vertex properties you still can use TinkerPop valueMap() or values()

    GraphTraversal<Row, Map<String,Object>> traversal = graph.E().hasLabel("foo").valueMap();
    while (traversal.hasNext()) {}