Possible to put records that aren't same lengt

2019-08-27 03:01发布

I am reading a file into a dataframe like this

val df = spark.read
   .option("sep", props.inputSeperator)
   .option("header", "true")
   .option("badRecordsPath", "/mnt/adls/udf_databricks/error")
   .csv(inputLoc)

The file is setup like this

col_a|col_b|col_c|col_d
1|first|last|
2|this|is|data
3|ok
4|more||stuff
5|||

Now, spark will read all of this as acceptable data. However, I want 3|ok to be marked as a bad record because it's size does not match the header size. Is this possible?

标签： scala apache-spark error-handling apache-spark-sql databricks

2条回答

Bombasti

2楼-- · 2019-08-27 03:37

The below code is supported by databricks implementation of spark.I dont see schema mapping in your code. could you map it and try ?

.option("badRecordsPath", "/mnt/adls/udf_databricks/error")

Change your code like below,

val customSchema = StructType(Array(
    StructField("col_a", StringType, true),
    StructField("col_b", StringType, true),
    StructField("col_c", StringType, true),
    StructField("col_d", StringType, true)))

val df = spark.read
   .option("sep", props.inputSeperator)
   .option("header", "true")
   .option("badRecordsPath", "/mnt/adls/udf_databricks/error")
   .schema(customSchema)
   .csv(inputLoc)

More detail's you can refer Datbricks doc on badrecordspath

Thanks, Karthick

0人赞添加讨论(0) 举报

该账号已被封号

3楼-- · 2019-08-27 03:41

val a = spark.sparkContext.textFile(pathOfYourFile)
val size = a.first.split("\\|").length
a.filter(i => i.split("\\|",-1).size != size).saveAsTextFile("/mnt/adls/udf_databricks/error")

0人赞添加讨论(0) 举报

Possible to put records that aren't same lengt

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间