Read File spark, set field having specific value t

2019-02-21 04:33发布

问题:

I'm Reading a text file delimited with | . There are some fields having value \N . When read the file row by row to a data-frame, is there any way to make the field having value \N to null or "" . Code is given below.

val inputDf = sqlContext.read.format("csv")
      .option("header", "true")
      .option("inferSchema", "false")
      .schema(myschema)
      .option("delimiter", "|")
      .option("nullValue", "")
      .load("My Input file Path")

回答1:

Once you load the dataframe use the when condition on all the columns in generic way

inputDf.select(inputDf.columns.map(c=> when(col(c) === “\\N”,””).otherwise(col(c)).alias(c)):_*).show


回答2:

"DataFrameNaFunctions" can be used for replace value "\N" in all columns with "":

df.na.replace(df.columns.toSeq, Map("\\N" -> ""))