Spark/Scala flatten and flatMap is not working on

2019-09-01 09:01发布

问题:

I have a DataFrame containing three DataFrames of the same type (same parquet schema). They only differ in the content/values they are containing:

I want to flatten the structure, so that the three DataFrames are getting merged into one single Parquet DataFrame containing all of the content/values.

I tried it with flatten and flatMap, but with that I always get the error:

Error: No implicit view available from org.apache.spark.sql.DataFrame => Traversable[U].parquetsFiles.flatten Error: not enough arguments for method flatten: (implicit as Trav: org.apache.spark.sql.DataFrame => Traversable[U], implicit m: scala.reflect.ClassTag[U]. Unspecified value parameters asTrav, m. parquetFiles.flatten

I also converted it to a List and then tried to flatten and this is also producing the same error. Do you have any idea how to solve it or what is the problem here? Thanks, Alex

回答1:

So it seems like you want to join these three DataFrames together, to do that the unionAll function would work really well. You could do parquetFiles.reduce((x, y) => x.unionAll(y)) (note this will explode on an empty list but if you might have that just look at one of the folds instead of reduce).



回答2:

The scala compiler is looking for a way to convert the DataFrames to a Traversable so it can apply the flatten. But a DataFrame is not Traversable, so it will fail. Also, no ClassTag available because DataFrames are not statically typed.

The code you're looking for is

parquetFiles.reduce(_ unionAll _)

which can be optimized by the DataFrame execution engine.