I need to change the HDFS replication factor from 3 to 1 for my Spark program. While searching, I came up with the "spark.hadoop.dfs.replication" property, but by looking at https://spark.apache.org/docs/latest/configuration.html, it doesn't seem to exist anymore. So, how can I change the hdfs replication factor from my Spark program or using spark-submit?
相关问题
- How to maintain order of key-value in DataFrame sa
- Unusual use of the new keyword
- Get Runtime Type picked by implicit evidence
- Spark on Yarn Container Failure
- What's the point of nonfinal singleton objects
相关文章
- Java写文件至HDFS失败
- Gatling拓展插件开发,check(bodyString.saveAs("key"))怎么实现
- Livy Server: return a dataframe as JSON?
- RDF libraries for Scala [closed]
- Why is my Dispatching on Actors scaled down in Akk
- mapreduce count example
- How do you run cucumber with Scala 2.11 and sbt 0.
- GRPC: make high-throughput client in Java/Scala
You should use
spark.hadoop.dfs.replication
to set the replication factor in HDFS in your spark application. But why you cannot find it in the https://spark.apache.org/docs/latest/configuration.html? That's because that link ONLY contains spark specific configuration. As a matter of fact, any property you set started withspark.hadoop.*
will be automatically translated to a Hadoop property, stripping the beginning "spark.haddoop.
". You can find how it is implemented at https://github.com/apache/spark/blob/d7b1fcf8f0a267322af0592b2cb31f1c8970fb16/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scalaThe method you should look for is
appendSparkHadoopConfigs
HDFDS configuration is not specific in any way to Spark. You should be able to modify it, using standard Hadoop configuration files. In particular
hdfs-site.xml
:It is also possible to access Hadoop configuration using
SparkContext
instance: