在PySpark如何解析嵌入式JSON(In PySpark how to parse an emb

我是新来PySpark。

我有低于模式的JSON文件

df = spark.read.json(input_file)

df.printSchema()

 |-- UrlsInfo: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- displayUrl: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |    |    |-- url: string (nullable = true)
 |-- type: long (nullable = true)

我想这应该只有两列式和UrlsInfo.element.DisplayUrl新的结果数据框

这是我尝试代码，它并没有给预期的输出

  df.createOrReplaceTempView("the_table")  
  resultDF = spark.sql("SELECT type, UrlsInfo.element.DisplayUrl FROM the_table")
  resultDF.show()

我想resultDF是这样的：

Type | DisplayUrl
----- ------------
2    | http://example.com

这是有关在Pyspark JSON文件的解析，但不回答我的问题。

正如你可以在你的方案看， UrlsInfo是一个数组类型，而不是一个结构。因此，“元素”架构项目指的不是一个命名的属性（您想通过访问它.element ），但数组元素（这是为了响应像索引[0]

我用手转载架构：

from pyspark.sql import Row
df = spark.createDataFrame([Row(UrlsInfo=[Row(displayUri="http://example.com", type="narf", url="poit")], Type=2)])
df.printSchema()

root
 |-- Type: long (nullable = true)
 |-- UrlsInfo: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- displayUri: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |    |    |-- url: string (nullable = true)

和我能够产生像一个表，你似乎在寻找通过使用索引的内容：

df.createOrReplaceTempView("temp")
resultDF = spark.sql("SELECT type, UrlsInfo[0].DisplayUri FROM temp")
resultDF.show()

+----+----------------------+
|type|UrlsInfo[0].DisplayUri|
+----+----------------------+
|   2|    http://example.com|
+----+----------------------+

然而，这仅仅给出的第一个元素（如果有的话） UrlsInfo在第二列中。

编辑：我已经忘记了EXPLODE功能，您可以使用此治疗UrlsInfo元素，比如一组行：

from pyspark.sql import Row
df = spark.createDataFrame([Row(UrlsInfo=[Row(displayUri="http://example.com", type="narf", url="poit"), Row(displayUri="http://another-example.com", type="narf", url="poit")], Type=2)])
df.createOrReplaceTempView("temp")
resultDF = spark.sql("SELECT type, EXPLODE(UrlsInfo.displayUri) AS displayUri FROM temp")
resultDF.show()

+----+--------------------+
|type|          displayUri|
+----+--------------------+
|   2|  http://example.com|
|   2|http://another-ex...|
+----+--------------------+