Solved: Tuning shuffle partitions - Databricks

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

想发财的花卷 · 深度关注 | 严明纪律纠治年龄造假· 4 月前 ·

文质彬彬的拐杖 · 博客來-電燈的發明：愛迪生的故事（書＋CD）· 4 月前 ·

沉稳的烈马 · WPS ...· 1 年前 ·

痛苦的领带 · 13号线拆分四座车站已封顶！天通苑站可换乘5 ...· 1 年前 ·

温柔的汽水 · Android 随笔—— ...· 1 年前 ·

AQE ( enabled by default from 7.3 LTS + onwards) adjusts the shuffle partition number automatically at each stage of the query, based on the size of the map-side shuffle output. So as data size grows or shrinks over different stages, the task size will remain roughly the same, neither too big nor too small.

However it does not set the map-side partition number automatically today. Hence it is recommended to set initial shuffle partition number through the SQL config spark.sql.shuffle.partitions. Now Databricks has a feature to “Auto-Optimized Shuffle” ( spark.databricks.adaptive.autoOptimizeShuffle.enabled) which automates the need for setting this manually. For the vast majority of use cases, enabling this auto mode would be sufficient . However, if you want to hand tune you could set spark.sql.shuffle.partitions manually.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.

Click here to register and join today!

Engage in exciting technical discussions , join a group with your peers and meet our Featured Members.

in Data Engineering 05-19-2023 in Data Engineering 05-19-2023 in Data Engineering 03-17-2023 in Data Engineering 03-06-2023 in Data Engineering 01-25-2023 © Databricks 2023. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation.

Privacy Notice

Your Privacy Choices

Your California Privacy Rights