The error occurred when Spark trying to establish an internal Kafka consumer to read messages in the topic.
Look into the details
The error happens to class
PoolConfig
where method
setMinEvictableIdleTime
doesn't exist. This class is part of Apache Commons Pool library (
commons-pool2
).
From Maven central, the following versions are used by
org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0
:
2.11.1
.
For class
PoolConfig
, it is inherited from
BaseObjectPoolConfig
. In the base class,
method
setMinEvictableIdleTime
was added from version
2.10.0
. Before that version, method
setMinEvictableIdleTimeMillis
was used.
Thus I am thinking - it might be because of the older version of commons-pool2 is used. However, from the Spark job logs, I can tell that version 2.11.1 was loaded:
2022-08-26T23:38:09,085 INFO [Thread-6] org.apache.spark.executor.Executor - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar with timestamp 1661521085729
2022-08-26T23:38:09,086 INFO [Thread-6] org.apache.spark.util.Utils - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp
2022-08-26T23:38:09,089 INFO [Thread-6] org.apache.spark.util.Utils - /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp has been previously copied to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar
2022-08-26T23:38:09,094 INFO [Thread-6] org.apache.spark.executor.Executor - Adding file:/tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar to class loader
Then I looked into Spark (3.3.0) jars folder and I can find a version of 1.5.4 for commons-pool:
commons-pool-1.5.4.jar
.
Resolution
I then manually downloaded
commons-pool2
version
2.11.1
into Spark jars folder:
spark-3.3.0/jars$ wget https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.1/commons-pool2-2.11.1.jar
spark-3.3.0/jars$ ls | grep commons-pool
commons-pool-1.5.4.jar
commons-pool2-2.11.1.jar
Rerun my Spark structure streaming application, the issue is then resolved.
warning
Warning - I am not 100% sure whether replacing this library will cause issues to Spark. At the moment, I have not hit any issues. So please be cautious while adopting this method.
info
Last modified by Raymond 3 years ago
copyright
This page is subject to
Site terms
.
When using commercial solutions, people would config a shared group of jars across big data stack and the versions would generally be aligned.
format_quote
person
Matthias
access_time
3 months ago
This is a great solution....I could solve my problem with this. I wonder why the issue is not more common?
This is a great solution....I could solve my problem with this. I wonder why the issue is not more common?