I have a Dataframe with rows that look like this:
[WrappedArray(1, 5DC7F285-052B-4739-8DC3-62827014A4CD, 1, 1425450997, 714909, 1425450997, 714909, {}, 2013, GAVIN, ST LAWRENCE, M, 9)]
[WrappedArray(2, 17C0D0ED-0B12-477B-8A23-1ED2C49AB8AF, 2, 1425450997, 714909, 1425450997, 714909, {}, 2013, LEVI, ST LAWRENCE, M, 9)]
[WrappedArray(3, 53E20DA8-8384-4EC1-A9C4-071EC2ADA701, 3, 1425450997, 714909, 1425450997, 714909, {}, 2013, LOGAN, NEW YORK, M, 44)]
...
Everything before the year (2013 in this example) is nonsense that should be dropped. I would like to map the data to a Name
class that I have created and put it into a new dataframe.
How do I get to the data and do that mapping?
Here is my Name
class:
case class Name(year: Int, first_name: String, county: String, sex: String, count: Int)
Basically, I would like to fill my dataframe with rows and columns according to the schema of the Name
class. I know how to do this part, but I just don't know how to get to the data in the dataframe.
Assuming the data is an array of strings like this:
val df = Seq(Seq("1", "5DC7F285-052B-4739-8DC3-62827014A4CD", "1", "1425450997", "714909", "1425450997", "714909", "{}", "2013", "GAVIN", "STLAWRENCE", "M", "9"),
Seq("2", "17C0D0ED-0B12-477B-8A23-1ED2C49AB8AF", "2", "1425450997", "714909", "1425450997", "714909", "{}", "2013", "LEVI", "ST LAWRENCE", "M", "9"),
Seq("3", "53E20DA8-8384-4EC1-A9C4-071EC2ADA701", "3", "1425450997", "714909", "1425450997", "714909", "{}", "2013", "LOGAN", "NEW YORK", "M", "44"))
.toDF("array")
You could either use an UDF
that returns a case class or you can use withColumn
multiple times. The latter should be more efficient and can be done like this:
val df2 = df.withColumn("year", $"array"(8).cast(IntegerType))
.withColumn("first_name", $"array"(9))
.withColumn("county", $"array"(10))
.withColumn("sex", $"array"(11))
.withColumn("count", $"array"(12).cast(IntegerType))
.drop($"array")
.as[Name]
This will give you a DataSet[Name]
:
+----+----------+-----------+---+-----+
|year|first_name|county |sex|count|
+----+----------+-----------+---+-----+
|2013|GAVIN |STLAWRENCE |M |9 |
|2013|LEVI |ST LAWRENCE|M |9 |
|2013|LOGAN |NEW YORK |M |44 |
+----+----------+-----------+---+-----+
Hope it helped!