I have the following schema -
[name: StringType, grades: ArrayType( StructType( StructField(subject_grades, ArrayType(StructType(StructField(subject,StringType,false), StructField(grade,LongType,false)]
I want to groupby
on the subject field inside the subject_grades
array which is inside the grades array.
I tried
sql.sql("select ... from grades_table group by grades.subject_grades.subject")
but I get
org.apache.spark.sql.AnalysisException: cannot resolve 'grades.subject_grades[subject]' due to data type mismatch: argument 2 requires integral type, however, 'subject' is of string type.;
I understand why I get this error, however I was hoping I could avoid exploding the entire thing in order to group by on the inner field.