添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Spark & PySpark

Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.


send Subscribe

Context

When using structured streaming to sink Kafka messages into HDFS using Spark, I am hitting this error:

java.lang.NoSuchMethodError: org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$PoolConfig.setMinEvictableIdleTime(Ljava/time/Duration;)V

The environment I am using:

  • Spark: 3.3.0 (Scala 2.12)
  • Kafka: kafka_2.13-3.2.0 (Kafka 3.2.0, Scala 2.13).
  • org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0 (for Spark 3.3.0, Kafka broker 0.10.0+, Scala 2.12)

The error occurred when Spark trying to establish an internal Kafka consumer to read messages in the topic.

Look into the details

The error happens to class PoolConfig where method setMinEvictableIdleTime doesn't exist. This class is part of Apache Commons Pool library ( commons-pool2 ).

From Maven central, the following versions are used by org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0 : 2.11.1 .

The exception is raised from line 186: InternalKafkaConsumerPool.scala#L186 in the file.

For class PoolConfig , it is inherited from BaseObjectPoolConfig . In the base class, method setMinEvictableIdleTime was added from version 2.10.0 . Before that version, method setMinEvictableIdleTimeMillis was used.

Thus I am thinking - it might be because of the older version of commons-pool2 is used. However, from the Spark job logs, I can tell that version 2.11.1 was loaded:

2022-08-26T23:38:09,085 INFO [Thread-6] org.apache.spark.executor.Executor - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar with timestamp 1661521085729
2022-08-26T23:38:09,086 INFO [Thread-6] org.apache.spark.util.Utils - Fetching spark://localhost:39883/jars/org.apache.commons_commons-pool2-2.11.1.jar to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp
2022-08-26T23:38:09,089 INFO [Thread-6] org.apache.spark.util.Utils - /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/fetchFileTemp5942499760833953456.tmp has been previously copied to /tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar
2022-08-26T23:38:09,094 INFO [Thread-6] org.apache.spark.executor.Executor - Adding file:/tmp/spark-547fe757-e24b-4675-843d-0122d27b6daf/userFiles-223b6753-1c52-4816-baf2-bf324f94e01f/org.apache.commons_commons-pool2-2.11.1.jar to class loader

Then I looked into Spark (3.3.0) jars folder and I can find a version of 1.5.4 for commons-pool: commons-pool-1.5.4.jar .

Resolution

I then manually downloaded commons-pool2 version 2.11.1 into Spark jars folder:

spark-3.3.0/jars$ wget https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.1/commons-pool2-2.11.1.jar
spark-3.3.0/jars$ ls | grep commons-pool
commons-pool-1.5.4.jar
commons-pool2-2.11.1.jar

Rerun my Spark structure streaming application, the issue is then resolved.

warning Warning - I am not 100% sure whether replacing this library will cause issues to Spark. At the moment, I have not hit any issues. So please be cautious while adopting this method.
info Last modified by Raymond 3 years ago copyright This page is subject to Site terms .

When using commercial solutions, people would config a shared group of jars across big data stack and the versions would generally be aligned.

format_quote

person Matthias access_time 3 months ago

This is a great solution....I could solve my problem with this. I wonder why the issue is not more common?

info By using this site, you acknowledge that you have read and understand our Cookie policy , Privacy policy and Terms .