I am trying to integrate Spark 2.1 job's metrics to Ganglia.
My spark-default.conf looks like
*.sink.ganglia.class org.apache.spark.metrics.sink.GangliaSink
*.sink.ganglia.name Name
*.sink.ganglia.host $MASTERIP
*.sink.ganglia.port $PORT
*.sink.ganglia.mode unicast
*.sink.ganglia.period 10
*.sink.ganglia.unit seconds
When i submit my job i can see the warn
Warning: Ignoring non-spark config property: *.sink.ganglia.host=host
Warning: Ignoring non-spark config property: *.sink.ganglia.name=Name
Warning: Ignoring non-spark config property: *.sink.ganglia.mode=unicast
Warning: Ignoring non-spark config property: *.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink
Warning: Ignoring non-spark config property: *.sink.ganglia.period=10
Warning: Ignoring non-spark config property: *.sink.ganglia.port=8649
Warning: Ignoring non-spark config property: *.sink.ganglia.unit=seconds
My environment details are
Hadoop : Amazon 2.7.3 - emr-5.7.0
Spark : Spark 2.1.1,
Ganglia: 3.7.2
If you have any inputs or any other alternative of Ganglia please reply.
For EMR specifically, you'll need to put these settings in
/etc/spark/conf/metrics.properties
on the master node.Spark on EMR does include the Ganglia library:
In addition, your example is missing the equals sign (
=
) between the config names and values - unsure if that's an issue. Below is an example config that worked successfully for me.according to the spark docs
so instead of having these confs in
spark-default.conf
, move them to$SPARK_HOME/conf/metrics.properties
From this page: https://spark.apache.org/docs/latest/monitoring.html