I am new to GraphX and have a Spark dataframe with four columns like below:
src_ip dst_ip flow_count sum_bytes
8.8.8.8 1.2.3.4 435 1137
... ... ... ...
Basically I want to map both src_ip
and dst_ip
to vertices and assign flow_count
and sum_bytes
as edges attribute. As far as I know, we can not add edges attributes in GraphX as only vertex attributes are permitted. Hence, I am thinking about adding flow_count
as edge weight:
//create edges
val trafficEdges = trafficsFromTo.map(x =Edge(MurmurHash3.stringHash(x(0).toString,MurmurHash3.stringHash(x(1).toString,x(2))
However, can I add sum_bytes
as edge weight as well?
It is possible to add both variables to the edge. The simplest solution would be to use a tuple, for example:
Alternatively, you can make use of a case class:
Using a case class would be more convenient to use and maintain if there are more attributes to be added.
I believe that in this specific case, it is most elegantly solved by: