Write/store dataframe in text file

I am trying to write dataframe to text file. If a file contains single column then I am able to write in text file. If file contains multiple column then I a facing some error

Text data source supports only a single column, and you have 2 columns.

object replace {

  def main(args:Array[String]): Unit = {

    Logger.getLogger("org").setLevel(Level.ERROR)

    val spark = SparkSession.builder.master("local[1]").appName("Decimal Field Validation").getOrCreate()

    var sourcefile = spark.read.option("header","true").text("C:/Users/phadpa01/Desktop/inputfiles/decimalvalues.txt")

     val rowRDD = sourcefile.rdd.zipWithIndex().map(indexedRow => Row.fromSeq((indexedRow._2.toLong+1) +: indexedRow._1.toSeq)) //adding prgrefnbr               
                         //add column for prgrefnbr in schema
     val newstructure = StructType(Array(StructField("PRGREFNBR",LongType)).++(sourcefile.schema.fields))

     //create new dataframe containing prgrefnbr

     sourcefile = spark.createDataFrame(rowRDD, newstructure)
     val op= sourcefile.write.mode("overwrite").format("text").save("C:/Users/phadpa01/Desktop/op")

  }

}

标签： scala apache-spark

5条回答

劫难

2楼-- · 2020-02-11 13:37

I think using "substring" is more appropriate for all scenarios I feel.

Please check below code.

sourcefile.rdd
.map(r =>  { val x = r.toString; x.substring(1, x.length-1)})
.saveAsTextFile("C:/Users/phadpa01/Desktop/op")

0人赞添加讨论(0) 举报

混吃等死

3楼-- · 2020-02-11 13:38

you can convert the dataframe to rdd and covert the row to string and write the last line as

 val op= sourcefile.rdd.map(_.toString()).saveAsTextFile("C:/Users/phadpa01/Desktop/op")

Edited

As @philantrovert and @Pravinkumar have pointed that the above would append [ and ] in the output file, which is true. The solution would be to replace them with empty character as

val op= sourcefile.rdd.map(_.toString().replace("[","").replace("]", "")).saveAsTextFile("C:/Users/phadpa01/Desktop/op")

One can even use regex

0人赞添加讨论(0) 举报

贼婆χ

4楼-- · 2020-02-11 13:38

I would recommend using a csv or other delimited formats. The following is an example with the most concise/elegant way to write to .tsv in Spark 2+

val tsvWithHeaderOptions: Map[String, String] = Map(
  ("delimiter", "\t"), // Uses "\t" delimiter instead of default ","
  ("header", "true"))  // Writes a header record with column names

df.coalesce(1)         // Writes to a single file
  .write
  .mode(SaveMode.Overwrite)
  .options(tsvWithHeaderOptions)
  .csv("output/path")

0人赞添加讨论(0) 举报

家丑人穷心不美

5楼-- · 2020-02-11 13:44

You can save as text CSV file (.format("csv"))

The result will be a text file in a CSV format, each column will be separated by a comma.

val op = sourcefile.write.mode("overwrite").format("csv").save("C:/Users/phadpa01/Desktop/op")

More info can be found in the spark programming guide

0人赞添加讨论(0) 举报

smile是对你的礼貌

6楼-- · 2020-02-11 13:59

I use databricks api to save my DF output into text file.

myDF.write.format("com.databricks.spark.csv").option("header", "true").save("output.csv")

0人赞添加讨论(0) 举报

Write/store dataframe in text file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间