Bug #1882146 “failed unit due to “cannot exec in a stopped state...” : Bugs : Canonical Juju

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

气势凌人的弓箭 · 香港如何稳坐全球竞争力头把交椅_中国经济网― ...· 5 月前 ·

想表白的烤土司 · 韓劇《請回答1988》回歸Netflix！你 ...· 7 月前 ·

豁达的香蕉 · 白银市公共资源交易网· 10 月前 ·

知识渊博的橡皮擦 · R语言：字符串处理与正则表达式(1) ...· 1 年前 ·

跑龙套的大白菜 · 第18话直接唤我的名字吧-我的神仙大人-漫画牛· 1 年前 ·

While testing a k8s workload charm with Juju 2.8-rc3, I executed the following changes in quick succession:

juju upgrade-charm [...] # caused pod spec changes, pods recycled
juju config [...] # also pod spec changes, pods recycled again

Final state:

[agnew(~)] juju status mattermost/6
Model Controller Cloud/Region Version SLA Timestamp
mattermost mm-rc3 k8s/localhost 2.8-rc3 unsupported 09:43:42+12:00

SAAS Status Store URL
postgresql active mm-rc3 admin/database. postgresql

App Version Status Scale Charm Store Rev OS Address Notes
mattermost active 0/2 mattermost local 4 kubernetes 10.152.183.202

Unit Workload Agent Address Ports Message
mattermost/6 terminated failed 10.1.1.93 8065/TCP unit stopped by the cloud

[agnew(~)] juju debug-log --replay | grep mattermost/6
application- mattermost: 09:34:31 DEBUG juju.worker.uniter starting uniter for "mattermost/6"
application- mattermost: 09:34:31 DEBUG juju.worker. caasoperator start "mattermost/6"
application- mattermost: 09:34:31 INFO juju.worker. caasoperator start "mattermost/6"
application- mattermost: 09:34:31 DEBUG juju.worker. caasoperator "mattermost/6" started
application- mattermost: 09:34:31 DEBUG juju.worker. leadership mattermost/6 making initial claim for mattermost leadership
application- mattermost: 09:34:31 INFO juju.worker. leadership mattermost leadership for mattermost/6 denied
application- mattermost: 09:34:31 DEBUG juju.worker. leadership mattermost/6 waiting for mattermost leadership release
application- mattermost: 09:34:31 INFO juju.worker.uniter unit "mattermost/6" started
application- mattermost: 09:34:32 DEBUG juju.worker. leadership mattermost/6 is not mattermost leader
application- mattermost: 09:34:32 DEBUG juju.worker. caasoperator trigger running status for caas unit mattermost/6
application- mattermost: 09:34:35 DEBUG unit.mattermost /6.juju- log Legacy hooks/install does not exist.
application- mattermost: 09:34:35 DEBUG unit.mattermost /6.juju- log Emitting Juju event install.
application- mattermost: 09:34:35 DEBUG juju.worker. leadership mattermost/6 is not mattermost leader
application- mattermost: 09:34:39 DEBUG unit.mattermost /6.juju- log db:0: Legacy hooks/db- relation- created does not exist.
application- mattermost: 09:34:39 DEBUG unit.mattermost /6.juju- log db:0: Emitting Juju event db_relation_ created.
application- mattermost: 09:34:42 DEBUG unit.mattermost /6.juju- log Legacy hooks/leader- settings- changed does not exist.
application- mattermost: 09:34:42 DEBUG unit.mattermost /6.juju- log Emitting Juju event leader_ settings_ changed.
application- mattermost: 09:34:42 DEBUG unit.mattermost /6.juju- log mirroring app relation data for relation 0
application- mattermost: 09:34:42 DEBUG juju.worker. leadership mattermost/6 is not mattermost leader
application- mattermost: 09:34:51 INFO juju.worker.uniter unit "mattermost/6" shutting down: executing operation "remote init": attempt count exceeded: container not running not found
application- mattermost: 09:34:51 DEBUG juju.worker. uniter. remotestate got leadership change for mattermost/6: leader
application- mattermost: 09:34:51 INFO juju.worker. caasoperator stopped "mattermost/6", err: executing operation "remote init": attempt count exceeded: container not running not found
application- mattermost: 09:34:51 DEBUG juju.worker. caasoperator "mattermost/6" done: executing operation "remote init": attempt count exceeded: container not running not found
application- mattermost: 09:34:51 ERROR juju.worker. caasoperator exited "mattermost/6": executing operation "remote init": attempt count exceeded: container not running not found
application- mattermost: 09:34:51 INFO juju.worker. caasoperator restarting "mattermost/6" in 3s
application- mattermost: 09:34:54 INFO juju.worker. caasoperator start "mattermost/6"
application- mattermost: 09:34:54 DEBUG juju.worker. caasoperator "mattermost/6" started
application- mattermost: 09:35:13 INFO juju.worker. caasoperator stopped "mattermost/6", err: executing operation "remote init": Internal error occurred: error executing command in container: failed to exec in container: failed to create exec "65308676472d86 e7dbbd91c44e2e5 2ab28b5ed85ae13 5ce8a57d454d887 aefbb": cannot exec in a stopped state: unknown
application- mattermost: 09:35:13 DEBUG juju.worker. caasoperator "mattermost/6" done: executing operation "remote init": Internal error occurred: error executing command in container: failed to exec in container: failed to create exec "65308676472d86 e7dbbd91c44e2e5 2ab28b5ed85ae13 5ce8a57d454d887 aefbb": cannot exec in a stopped state: unknown
application- mattermost: 09:35:13 ERROR juju.worker. caasoperator exited "mattermost/6": executing operation "remote init": Internal error occurred: error executing command in container: failed to exec in container: failed to create exec "65308676472d86 e7dbbd91c44e2e5 2ab28b5ed85ae13 5ce8a57d454d887 aefbb": cannot exec in a stopped state: unknown
application- mattermost: 09:35:13 INFO juju.worker. caasoperator restarting "mattermost/6" in 3s
application- mattermost: 09:35:16 INFO juju.worker. caasoperator start "mattermost/6"
application- mattermost: 09:35:16 DEBUG juju.worker. caasoperator "mattermost/6" started
[agnew(~)] _

With deployments, pods get new identities as they are restarted so Juju will mark the original unit as stopped until the pod is removed from the cluster, and a new unit will be created when the new pod spins up.

It seems you've found a race in dealing with this when pod updates happen quickly. Restarting the controller causes Juju to reevaluate the state if the cluster and resync the juju model. It could also be that the pods were terminating and transitioned to removed during the controller restart.

Saw something similar (maybe?) to this with K8s charming using microk8s.

| application- content- cache: 02:35:08 DEBUG juju.worker. caasoperator retrying exec request, in 5 attempt, container not running not found
| application- content- cache: 02:35:08 ERROR juju.worker.uniter resolver loop error: executing operation "remote init": attempt count exceeded: container not running not found
| application- content- cache: 02:35:08 DEBUG juju.worker.uniter [AGENT-STATUS] failed: resolver loop error
| application- content- cache: 02:35:08 INFO juju.worker.uniter unit "content-cache/14" shutting down: executing operation "remote init": attempt count exceeded: container not running not found
| application- content- cache: 02:35:08 DEBUG juju.worker.uniter juju-run listener stopping
| application- content- cache: 02:35:08 DEBUG juju.worker. uniter. remotestate got leadership change for content-cache/14: leader
| application- content- cache: 02:35:08 DEBUG juju.worker.uniter juju-run listener stopped
| application- content- cache: 02:35:08 INFO juju.worker. caasoperator stopped "content-cache/14", err: executing operation "remote init": attempt count exceeded: container not running not found
| application- content- cache: 02:35:08 DEBUG juju.worker. caasoperator "content-cache/14" done: executing operation "remote init": attempt count exceeded: container not running not found
| application- content- cache: 02:35:08 ERROR juju.worker. caasoperator exited "content-cache/14": executing operation "remote init": attempt count exceeded: container not running not found
| application- content- cache: 02:35:08 INFO juju.worker. caasoperator restarting "content-cache/14" in 3s
| application- content- cache: 02:35:09 DEBUG juju.worker. uniter. operation committing operation "remote init"

This is with Juju 2.8.1. No new pods were being provisioned until I bounced some things, in my case, rebooted the entire VM.

Saw something similar (maybe?) to this with K8s charming using microk8s.
| application-content-cache: 02:35:08 DEBUG juju.worker.caasoperator retrying exec request, in 5 attempt, container not running not found
| application-content-cache: 02:35:08 ERROR juju.worker.uniter resolver loop error: executing operation "remote init": attempt count exceeded: container not running not found
| application-content-cache: 02:35:08 DEBUG juju.worker.uniter [AGENT-STATUS] failed: resolver loop error
| application-content-cache: 02:35:08 INFO juju.worker.uniter unit "content-cache/14" shutting down: executing operation "remote init": attempt count exceeded: container not running not found
| application-content-cache: 02:35:08 DEBUG juju.worker.uniter juju-run listener stopping
| application-content-cache: 02:35:08 DEBUG juju.worker.uniter.remotestate got leadership change for content-cache/14: leader
| application-content-cache: 02:35:08 DEBUG juju.worker.uniter juju-run listener stopped
| application-content-cache: 02:35:08 INFO juju.worker.caasoperator stopped "content-cache/14", err: executing operation "remote init": attempt count exceeded: container not running not found
| application-content-cache: 02:35:08 DEBUG juju.worker.caasoperator "content-cache/14" done: executing operation "remote init": attempt count exceeded: container not running not found
| application-content-cache: 02:35:08 ERROR juju.worker.caasoperator exited "content-cache/14": executing operation "remote init": attempt count exceeded: container not running not found
| application-content-cache: 02:35:08 INFO juju.worker.caasoperator restarting "content-cache/14" in 3s
| application-content-cache: 02:35:09 DEBUG juju.worker.uniter.operation committing operation "remote init"
This is with Juju 2.8.1. No new pods were being provisioned until I bounced some things, in my case, rebooted the entire VM.

| Unit Workload Agent Address Ports Message
| content-cache/14 terminated failed 10.1.75.77 80/TCP unit stopped by the cloud
| content-cache/15 terminated executing 10.1.75.78 80/TCP (remove)
| content-cache/18 waiting allocating 10.1.75.81 80/TCP agent initializing
| content-cache/19 waiting allocating 10.1.75.82 80/TCP agent initializing

Full Juju status output - https:/ /paste. ubuntu. com/p/tDwxcw3BR 3/

This could be an issue with the charm in that each time the charm sends in a pod spec, k8s will need to do a rolling update of the Deployment, which could trigger the charm to send in a new pod spec etc. There was a case recently with the dex-auth charm where the pod spec included a hash of the config and so the pod spec was never stable and it remained in that loop.