Spark Dataframe column nullable property change

2019-02-16 02:05发布

问题:

I want to change the nullable property of a particular column in a Spark Dataframe.

If i print schema of the dataframe currently it looks like below. col1: string (nullable = false) col2: string (nullable = true) col3: string (nullable = false) col4: float (nullable = true)

I just want col3 nullable property to be updated. col1: string (nullable = false) col2: string (nullable = true) col3: string (nullable = true) col4: float (nullable = true)

I checked online here are some links, but seems like they are doing it for all the columns but not to a specific column. Change nullable property of column in spark dataframe Can any one please help me in this regard.

回答1:

There is no "clear" way to do this. You can use trick like here

Relevant code from that answer:

def setNullableStateOfColumn( df: DataFrame, cn: String, nullable: Boolean) : DataFrame = {

  // get schema
  val schema = df.schema
  // modify [[StructField] with name `cn`
  val newSchema = StructType(schema.map {
    case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t, nullable = nullable, m)
    case y: StructField => y
  })
  // apply new schema
  df.sqlContext.createDataFrame( df.rdd, newSchema )
}

It would copy DataFrame and copy schema, but with specyfying nullable programatically

Version for many columns:

def setNullableStateOfColumn(df: DataFrame, nullValues: Map[String, Boolean]) : DataFrame = {

  // get schema
  val schema = df.schema
  // modify [[StructField]s with name `cn`
  val newSchema = StructType(schema.map {
    case StructField( c, t, _, m) if nullValues.contains(c) => StructField( c, t, nullable = nullValues.get(c), m)
    case y: StructField => y
  })
  // apply new schema
  df.sqlContext.createDataFrame( df.rdd, newSchema )
}

Usage: setNullableStateOfColumn(df1, Map ("col1" -> true, "col2" -> true, "col7" -> false));