Read csv file in Apache Spark from remote location

2019-03-06 12:26发布

问题:

I have a file in a Ubuntu machine which I want to read in Apache spark .

I found this example :

object BasicTextFromFTP {
    def main(args: Array[String]) {
      val conf = new SparkConf
      conf.setMaster(args(0))
      val sc = new SparkContext(conf)
      val file = sc.textFile("ftp://anonymous:pandamagic@ftp.ubuntu.com/ubuntu/ls-LR.gz")
      println(file.collect().mkString("\n"))
    }
}

on this link :

https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicLoadTextFromFTP.scala

I don’t understand how the URL is created. Please help me with this.

回答1:

Basic structure of the URL is a schema type (here ftp) followed by

//<user>:<password>@<host>:<port>/<url-path>

where every part excluding host can be omitted.