微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

spark处理嵌套json

json文件数据如下:
{“avg_orders_count”: [{“count”: 1.0, “days”: 3}, {“count”: 0.6, “days”: 5}, {“count”: 0.3, “days”: 10}, {“count”: 0.2, “days”: 15}, {“count”: 0.1, “days”: 30}, {“count”: 0.066, “days”: 45}, {“count”: 0.066, “days”: 60}, {“count”: 0.053, “days”: 75}, {“count”: 0.044, “days”: 90}], “m_hotel_id”: “92500636”}
{“avg_orders_count”: [{“count”: 0.666, “days”: 3}, {“count”: 0.4, “days”: 5}, {“count”: 0.4, “days”: 10}, {“count”: 0.266, “days”: 15}, {“count”: 0.33, “days”: 30}, {“count”: 0.466, “days”: 45}, {“count”: 0.583, “days”: 60}, {“count”: 0.68, “days”: 75}, {“count”: 0.6111, “days”: 90}], “m_hotel_id”: “92409831”}
spark读json文件

from pyspark.sql import SparkSession, Row, functions
session = SparkSession.builder.appName("sort").getorCreate()
data = session.read.json('test1')
data.head()

在这里插入图片描述

mydata = data.select(explode(data.avg_orders_count), data.m_hotel_id).toDF('my_count', 'id')
mydata.head()

在这里插入图片描述

mydata = mydata.select(mydata.id, 'my_count.days', 'my_count.count')
mydata.show()

 

在这里插入图片描述


这样就把json展开了,可以做自己想做的操作了~

 

 

https://blog.csdn.net/u013215956/article/details/86232425?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~first_rank_v2~rank_v25-1-86232425.nonecase&utm_term=spark%20%E8%AF%BB%E5%8F%96%E5%B5%8C%E5%A5%97%E7%9A%84json&spm=1000.2123.3001.4430

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐