spark.yarn.executor.memoryOverhead... - Cloudera Community

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

瘦瘦的手电筒 · torch.fft.rfft — ...· 4 月前 ·

鼻子大的脆皮肠 · 簡單的 CUDA ...· 3 月前 ·

活泼的芹菜 · Using SystemTap ...· 3 月前 ·

豪情万千的牛肉面 · Allocate memory in ...· 3 天前 ·

重感情的热水瓶 · 关于促进房地产市场平稳健康发展有关税收政策的 ...· 11 月前 ·

魁梧的汉堡包 · 习近平总书记河南考察纪实_时政要闻_南阳市司法局· 11 月前 ·

坏坏的茴香 · 婉儿别闹助眠小剧场_哔哩哔哩_bilibili· 2 年前 ·

怕老婆的生姜 · 西藏公共资源交易平台· 2 年前 ·

睿智的椰子 · 亳州市1区3县建成区排名，谯城区最大，利辛县 ...· 2 年前 ·

Does anyone know exactly what spark.yarn.executor.memoryOverhead is used for and why it may be using up so much space? If I could, I would love to have a peek inside this stack. Spark's description is as follows:

The amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).

The problem I'm having is when running spark queries on large datasets ( > 5TB), I am required to set the executor memoryOverhead to 8GB otherwise it would throw an exception and die. What is being stored in this container that it needs 8GB per container?

I've also noticed that this error doesn't occur on standalone mode, because it doesn't use YARN.

Note. My configurations for this job are:

executor memory = 15G executor cores = 5 yarn.executor.memoryOverhead = 8GB max executors = 60 offHeap.enabled = false

@Henry : I think that equation uses the executor memory (in your case, 15G) and outputs the overhead value.

// Below calculation uses executorMemory, not memoryOverhead
math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, MEMORY_OVERHEAD_MIN))

The goal is to calculate OVERHEAD as a percentage of real executor memory, as used by RDDs and DataFrames. I will add that when using Spark on Yarn, the Yarn configuration settings have to be adjusted and tweaked to match up carefully with the Spark properties (as the referenced blog suggests). You might also want to look at Tiered Storage to offload RDDs into MEM_AND_DISK, etc.

Per recent Spark docs, you can't actually set the heap size that way. You need to use `spark.executor.memory` to do so. https://spark.apache.org/docs/2.1.1/configuration.html#runtime-environment

Just an FYI, Spark 2.1.1 doesn't allow setting the heap space in `extraJavaOptions`:

java.lang.Exception: spark.executor.extraJavaOptions is not allowed to specify max heap memory settings (was ''-Xmx20g''). Use spark.executor.memory instead.

推荐文章

瘦瘦的手电筒 · torch.fft.rfft — PyTorch 2.10 documentation

4 月前

鼻子大的脆皮肠 · 簡單的 CUDA 程式：DeviceInfo – Heresy's Space

3 月前

活泼的芹菜 · Using SystemTap userspace static probes – Will's Blog

3 月前

豪情万千的牛肉面 · Allocate memory in Docker Windows (with WSL2) - Docker Desktop - Docker Community Forums

3 天前

重感情的热水瓶 · 关于促进房地产市场平稳健康发展有关税收政策的公告广州市财政局网站

11 月前

魁梧的汉堡包 · 习近平总书记河南考察纪实_时政要闻_南阳市司法局

11 月前

坏坏的茴香 · 婉儿别闹助眠小剧场_哔哩哔哩_bilibili

2 年前

怕老婆的生姜 · 西藏公共资源交易平台

2 年前

睿智的椰子 · 亳州市1区3县建成区排名，谯城区最大，利辛县最小，来了解一下？|涡阳县|蒙城县|安徽省_网易订阅

2 年前