Hi I am new to graph world. I have been assigned to work on graph processing now I know Apache Spark so thought of using it Graphx to process large graph. Then I came across Gephi provides nice GUI to manipulate graphs. Does Graphx have such tools or it is mainly parallel graph processing library. Can I import json graph data came from Gephi into graphx? Please guide. I know It's basic but valid question. Thanks in advance.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Adding to that you can as well try Graphlab https://dato.com/products/create/open_source.html
It directly support Spark RDD https://dato.com/learn/userguide/data_formats_and_sources/spark_integration.html
Not much work required after that
from pyspark import SparkContext
import graphlab as gl
sc = SparkContext('yarn-client')
t = sc.textFile("hdfs://some/large/file")
sf = gl.SFrame.from_rdd(t)
# do stuff...
out_rdd = sf.to_rdd(sc)
回答2:
If you are new to graph world, you can use Apache Zeppelin for Spark, but Apache Zeppelin is incubator project.
回答3:
No, Apache Spark Graphx have no visualization, it's just a processing engine but you can import data from gephi to graphx using Gephi's API.