How to change column types in Spark SQL's Data-第3页回答

Suppose I'm doing something like:

val df = sqlContext.load("com.databricks.spark.csv", Map("path" -> "cars.csv", "header" -> "true"))
df.printSchema()

root
 |-- year: string (nullable = true)
 |-- make: string (nullable = true)
 |-- model: string (nullable = true)
 |-- comment: string (nullable = true)
 |-- blank: string (nullable = true)

df.show()
year make  model comment              blank
2012 Tesla S     No comment                
1997 Ford  E350  Go get one now th...

but I really wanted the year as Int (and perhaps transform some other columns).

The best I could come up with is

df.withColumn("year2", 'year.cast("Int")).select('year2 as 'year, 'make, 'model, 'comment, 'blank)
org.apache.spark.sql.DataFrame = [year: int, make: string, model: string, comment: string, blank: string]

which is a bit convoluted.

I'm coming from R, and I'm used to being able to write, e.g.

df2 <- df %>%
   mutate(year = year %>% as.integer, 
          make = make %>% toupper)

I'm likely missing something, since there should be a better way to do this in spark/scala...

标签： scala apache-spark apache-spark-sql

16条回答

劳资没心，怎么记你

2楼-- · 2019-01-03 12:21

Since Spark version 1.4 you can apply the cast method with DataType on the column:

import org.apache.spark.sql.types.IntegerType
val df2 = df.withColumn("yearTmp", df.year.cast(IntegerType))
    .drop("year")
    .withColumnRenamed("yearTmp", "year")

If you are using sql expressions you can also do:

val df2 = df.selectExpr("cast(year as int) year", 
                        "make", 
                        "model", 
                        "comment", 
                        "blank")

For more info check the docs: http://spark.apache.org/docs/1.6.0/api/scala/#org.apache.spark.sql.DataFrame

0人赞添加讨论(0) 举报

放我归山

3楼-- · 2019-01-03 12:21

Java code for modifying the datatype of the DataFrame from String to Integer

df.withColumn("col_name", df.col("col_name").cast(DataTypes.IntegerType))

It will simply cast the existing(String datatype) to Integer.

0人赞添加讨论(0) 举报

叼着烟拽天下

4楼-- · 2019-01-03 12:23

To convert the year from string to int, you can add the following option to the csv reader: "inferSchema" -> "true", see DataBricks documentation

0人赞添加讨论(0) 举报

爷的心禁止访问

5楼-- · 2019-01-03 12:24

Another way:

// Generate a simple dataset containing five values and convert int to string type

val df = spark.range(5).select( col("id").cast("string")).withColumnRenamed("id","value")

0人赞添加讨论(0) 举报

上一页 1 2 3

How to change column types in Spark SQL's Data

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间