Solved: Notebook to read complex JSON array - Microsoft Fabric Community

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

纯真的冲锋衣 · 数据争用(data race) ...· 1 月前 ·

坏坏的雪糕 · Python获取JSON数据详解 - 搬砖的码农· 1 月前 ·

好帅的小熊猫 · 利用深度学习融合NWP和多源观测数据的闪电落 ...· 1 月前 ·

温暖的弓箭 · 11_vue计算属性computed_vue ...· 4 周前 ·

彷徨的充电器 · computed, watch, ...· 4 周前 ·

欢快的椅子 · 步进电机品牌厂家排行榜2022年【前十名】· 6 月前 ·

知识渊博的硬币 · 2007年第15期《院报》内容 - ...· 6 月前 ·

阳光的蚂蚁 · 山东省工业和信息化厅时政要闻 ...· 6 月前 ·

腼腆的大象 · 罗昌珍_百度百科· 8 月前 ·

活泼的铁链 · 中港電視。電影刊物資料庫· 9 月前 ·

Hello all. I am trying to learn PySpark from this website:

https://sparkbyexamples.com/pyspark/pyspark-alias-column-examples/

Good info, but I am stuck. I borrowed the simple JSON code that looks like this:

{ "RecordNumber": 2, "Zipcode": 704 },
{ "RecordNumber": 10, "Zipcode": 709 }
]

And I can read that in a data frame. But unfortunately, my data has an array name at the top, like this:

{ "data": [
{ "RecordNumber": 2, "Zipcode": 704 },
{ "RecordNumber": 10, "Zipcode": 709 }

I read these two items into 2 data frames, then do two selects in PySpark:

dfJSON1 = df1.select( col( "RecordNumber" ), col( "Zipcode" ))

dfJSON2 = df2.select( col( "data.RecordNumber" ), col( "data.Zipcode" ))

dfJSON1.show()

dfJSON2.show()

The two results:

What am I missing to get the second data frame to show two records, similar to the first?

This can't be that hard. What am I missing?

Thanks in advance.

Hi @ToddChitt ,

Wouldn't it be possible to use a couple of SQL functions like explode and col for this?

I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...

Below is an example based on your json code in one of my test notebooks.

# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
    col("data.RecordNumber").alias("RecordNumber"),
    col("data.Zipcode").alias("Zipcode")
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))     
dfJSON1.show()

@Expiscornovus Thanks for the quick response.

Your sample code worked great. Now it's up to me to figure out how to shred the multi-level nested arrays in my actual JSON documents.

I will check out that blog and try to learn a little more about PySpark.

Thanks

Hi @ToddChitt ,

Wouldn't it be possible to use a couple of SQL functions like explode and col for this?

I found that suggested approach in this blog: https://medium.com/towards-data-engineering/transforming-json-to-lakehouse-tables-with-microsoft-fab...

Below is an example based on your json code in one of my test notebooks.

# Apply transformation to the dataframe
from pyspark.sql.functions import col, explode
exploded_df = df.select(explode(col("data")).alias("data"))
tf_df = exploded_df.select(
    col("data.RecordNumber").alias("RecordNumber"),
    col("data.Zipcode").alias("Zipcode")
display(tf_df)
dfJSON1 = tf_df.select( col("RecordNumber"), col("Zipcode"))     
dfJSON1.show()

Fabric Monthly Update - March 2025

Check out the March 2025 Fabric update to learn about new features.

Learn more

Fabric Community Update - March 2025

Find out what's new and trending in the Fabric community.

推荐文章

纯真的冲锋衣 · 数据争用(data race) 和竞态条件(race condition)

1 月前

坏坏的雪糕 · Python获取JSON数据详解 - 搬砖的码农

1 月前

好帅的小熊猫 · 利用深度学习融合NWP和多源观测数据的闪电落区短时预报方法

1 月前

温暖的弓箭 · 11_vue计算属性computed_vue2 computed加载时执行两次

4 周前

彷徨的充电器 · computed, watch, update 的使用和区别 - sunshine233

4 周前

欢快的椅子 · 步进电机品牌厂家排行榜2022年【前十名】

6 月前

知识渊博的硬币 · 2007年第15期《院报》内容 - 北京协和医院 - 协和医院,北京协和医院,協和醫院,北京协和医院首页,北京协和医院电话,协和,協和,医院,醫院,北京協和醫院,北京协和医院妇科,北京协和医院地址,挂

6 月前

阳光的蚂蚁 · 山东省工业和信息化厅时政要闻 2024对话山东—日本·山东产业合作交流会举行

6 月前

腼腆的大象 · 罗昌珍_百度百科

8 月前

活泼的铁链 · 中港電視。電影刊物資料庫

9 月前