添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Hi James, > Bouncing the clients resolved the issue Could you please describe which version you upgrade to, to resolve this issue? That should also help other users encountering the same issue. And the code snippet you listed, existed since 2018, I don't think there is any problem there. Maybe there are bugs existed in other places, and got fixed indirectly. Thank you. On Tue, Nov 23, 2021 at 10:27 AM James Olsen mailto:ja...@inaseq.com >> wrote: We had a 2.5.1 Broker/Client system running for some time with regular rolling OS upgrades to the Brokers without any problems. A while ago we upgraded both Broker and Clients to 2.7.1 and now on the first rolling OS upgrade to the 2.7.1 Brokers we encountered some Consumer issues. We have a 3 Broker setup with min-ISRs configured to avoid any outage. So maybe we just got lucky 6 times in a row with the 2.5.1 or maybe there is an issue with the 2.7.1. The observable symptom is a continuous stream of "The coordinator is not available" messages when trying to commit offsets. It starts with the usual messages you might expect during a rolling upgrade... 2021-11-22 04:41:25,269 WARN [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] 'pool-7-thread-132' [Consumer clientId=consumer-MyService-group-58, groupId=MyService-group] Offset commit failed on partition MyTopic-0 at offset 866799313: The coordinator is loading and hence can't process requests. ... then 5 minutes of all OK, then ... 2021-11-22 04:46:33,258 WARN [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] 'pool-7-thread-132' [Consumer clientId=consumer-MyService-group-58, groupId=MyService-group] Offset commit failed on partition MyTopic-0 at offset 866803953: This is not the correct coordinator. 2021-11-22 04:46:33,258 INFO [org.apache.kafka.clients.consumer.internals.AbstractCoordinator] 'pool-7-thread-132' [Consumer clientId=consumer-MyService-group-58, groupId=MyService-group] Group coordinator b-2.xxx.com:9094< http://b-2.xxx.com:9094/ >< http://b-2.xxx.com:9094 < http://b-2.xxx.com:9094/ >> (id: 2147483645 rack: null) is unavailable or invalid due to cause: error response NOT_COORDINATOR.isDisconnected: false. Rediscovery will be attempted. 2021-11-22 04:46:33,258 WARN [xxx.KafkaConsumerRunner] 'pool-7-thread-132' Offset commit with offsets {MyTopic-0=OffsetAndMetadata{offset=866803953, leaderEpoch=null, metadata=''}} failed: org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets. Caused by: org.apache.kafka.common.errors.NotCoordinatorException: This is not the correct coordinator. ... then the following message for every subsequent attempt to commit offsets 2021-11-22 04:46:33,284 WARN [xxx.KafkaConsumerRunner] 'pool-7-thread-132' Offset commit with offsets {MyTopic-0=OffsetAndMetadata{offset=866803954, leaderEpoch=82, metadata=''}, MyOtherTopic-0=OffsetAndMetadata{offset=12654756, leaderEpoch=79, metadata=''}} failed: org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets. Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available. In the above example we are doing manual async-commits but we also had offset commit failure for a different consumer group (observed through lag monitoring) that uses auto-commit, it just didn't log the ongoing failures. In both cases messages were still being processed, it was just the commits not working. These are our two busiest consumer groups and both have static Topic assignments. Other consumer groups continued OK. I've spent some time examining the (Java) client code and started to wonder whether there is a bug or race condition that means the coordinator never gets reassigned after being invalidated and we simply keep hitting the following short-circuit: org.apache.kafka.clients.consumer.internals.ConsumerCoordinator RequestFuture<Void> sendOffsetCommitRequest(final Map<TopicPartition, OffsetAndMetadata> offsets) { if (offsets.isEmpty()) return RequestFuture.voidSuccess(); Node coordinator = checkAndGetCoordinator(); if (coordinator == null) return RequestFuture.coordinatorNotAvailable(); I'm not sure what the exact pathway is to getting the coordinator set but I note that org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(Timer) and other methods that look like they may be related tend to only log at debug when they encounter RetriableException so could explain why I don't have more detail to provide. I'm not familiar enough with the code to be able to trace this through any further, but if you've had the patience to keep reading this far then maybe you Bouncing the clients resolved the issue, but I'd be interested if any experts out there can identify if there is any weakness in the 2.7.1 version. Regards, James.
  • Consumer failure after rolling Broker upgrade James Olsen
  • Re: Consumer failure after rolling Broker upgrade Luke Chen
  • Re: Consumer failure after rolling Broker upgrade James Olsen
  • Re: Consumer failure after rolling Broker upgrade James Olsen
  • Re: Consumer failure after rolling Broker upg... Luke Chen
  •