How can I change HDFS replication factor for my Sp

2020-07-23 06:12发布

I need to change the HDFS replication factor from 3 to 1 for my Spark program. While searching, I came up with the "spark.hadoop.dfs.replication" property, but by looking at https://spark.apache.org/docs/latest/configuration.html, it doesn't seem to exist anymore. So, how can I change the hdfs replication factor from my Spark program or using spark-submit?

标签： scala hadoop apache-spark hdfs

2条回答

闹够了就滚

2楼-- · 2020-07-23 06:37

You should use spark.hadoop.dfs.replication to set the replication factor in HDFS in your spark application. But why you cannot find it in the https://spark.apache.org/docs/latest/configuration.html? That's because that link ONLY contains spark specific configuration. As a matter of fact, any property you set started with spark.hadoop.* will be automatically translated to a Hadoop property, stripping the beginning "spark.haddoop.". You can find how it is implemented at https://github.com/apache/spark/blob/d7b1fcf8f0a267322af0592b2cb31f1c8970fb16/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

The method you should look for is appendSparkHadoopConfigs

0人赞添加讨论(0) 举报

冷血范

3楼-- · 2020-07-23 06:50

HDFDS configuration is not specific in any way to Spark. You should be able to modify it, using standard Hadoop configuration files. In particular hdfs-site.xml:

<property> 
  <name>dfs.replication<name> 
  <value>3<value> 
<property>

It is also possible to access Hadoop configuration using SparkContext instance:

val hconf: org.apache.hadoop.conf.Configuration = spark.sparkContext.hadoopConfiguration
hconf.setInt("dfs.replication", 3)

0人赞添加讨论(0) 举报

How can I change HDFS replication factor for my Sp

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间