可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a mixed type dataframe. I am reading this dataframe from hive table using spark.sql('select a,b,c from table') command.

Some columns are int , bigint , double and others are string. There are 32 columns in total. Is there any way in pyspark to convert all columns in the data frame to string type ?

回答1:

Just:

from pyspark.sql.functions import col

table = spark.sql("table")

table.select([col(c).cast("string") for c in table.columns])

回答2:

Here's a one line solution in Scala :

df.select(df.columns.map(c => col(c).cast(StringType)) : _*)

Let's see an example here :

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val data = Seq(
   Row(1, "a"),
   Row(5, "z")
)

val schema = StructType(
  List(
    StructField("num", IntegerType, true),
    StructField("letter", StringType, true)
 )
)

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data),
  schema
)

df.printSchema
//root
//|-- num: integer (nullable = true)
//|-- letter: string (nullable = true)

val newDf = df.select(df.columns.map(c => col(c).cast(StringType)) : _*)

newDf.printSchema
//root
//|-- num: string (nullable = true)
//|-- letter: string (nullable = true)

I hope it helps

回答3:

For Scala, spark version > 2.0

case class Row(id: Int, value: Double)

import spark.implicits._

import org.apache.spark.sql.functions._

val r1 = Seq(Row(1, 1.0), Row(2, 2.0), Row(3, 3.0)).toDF()

r1.show
+---+-----+
| id|value|
+---+-----+
|  1|  1.0|
|  2|  2.0|
|  3|  3.0|
+---+-----+

val castedDF = r1.columns.foldLeft(r1)((current, c) => current.withColumn(c, col(c).cast("String")))

castedDF.printSchema
root
 |-- id: string (nullable = false)
 |-- value: string (nullable = false)