AQE (
enabled by default from 7.3 LTS + onwards) adjusts the
shuffle partition number
automatically at each stage of the query, based on the size of the map-side shuffle output. So as data size grows or shrinks over different stages, the task size will remain roughly the same, neither too big nor too small.
However it does not set the map-side partition number automatically today. Hence it is recommended to set initial
shuffle partition number
through the SQL config spark.sql.shuffle.partitions. Now Databricks has a feature to “Auto-Optimized Shuffle” ( spark.databricks.adaptive.autoOptimizeShuffle.enabled) which automates the need for setting this manually. For the vast majority of use cases, enabling this auto mode would be sufficient . However, if you want to hand tune you could set spark.sql.shuffle.partitions manually.
AQE (
enabled by default from 7.3 LTS + onwards) adjusts the
shuffle partition number
automatically at each stage of the query, based on the size of the map-side shuffle output. So as data size grows or shrinks over different stages, the task size will remain roughly the same, neither too big nor too small.
However it does not set the map-side partition number automatically today. Hence it is recommended to set initial
shuffle partition number
through the SQL config spark.sql.shuffle.partitions. Now Databricks has a feature to “Auto-Optimized Shuffle” ( spark.databricks.adaptive.autoOptimizeShuffle.enabled) which automates the need for setting this manually. For the vast majority of use cases, enabling this auto mode would be sufficient . However, if you want to hand tune you could set spark.sql.shuffle.partitions manually.
Welcome to Databricks Community: Lets learn, network and celebrate together
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.