kube_pod_container_status_waiting_reason strange behaviour · Issue #468

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

失恋的酱肘子 · 一桥飞架南北，天堑变通途——大连东软信息学院 ...· 6 天前 ·

风流的人字拖 · 深层—超深层海相碳酸盐岩成储成藏机理与油气藏 ...· 3 周前 ·

慈祥的针织衫 · 《马氏文通》之前，西方人怎么学汉语？· 1 月前 ·

温柔的烤红薯 · 拔掉鼠标系列（一）高效的vim | NexT· 3 月前 ·

跑龙套的红烧肉 · 吐血的bug：error: no type ...· 4 月前 ·

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account I'am using prometheus to watch the state of running pods on my k8s cluster. I'm using the


   kube_pod_container_status_waiting_reason

metric to do this.

For the test purpose, I create a deployment with a non-existing image in it to force error to raise:

$ kubectl run foo --image foo
Then, on my prometheus UI, I launch this query :
kube_pod_container_status_waiting_reason{reason=~"ContainerCreating|CrashLoopBackOff|ErrImagePull|ImagePullBackOff"} > 0
during the first minute I have this result :
kube_pod_container_status_waiting_reason{app="prometheus",chart="prometheus-6.3.0",component="kube-state-metrics",container="foo",heritage="Tiller",instance="100.97.57.7:8080",job="kubernetes-service-endpoints",kubernetes_name="my-release-prometheus-kube-state-metrics",kubernetes_namespace="prometheus",namespace="jung",pod="foo-6db855bd79-wb2rs",reason="ContainerCreating",release="my-release"}
So, kube-state-metric reports that my pod is in "ContainerCreating" state
Then, during about 1 minute I have this result :
kube_pod_container_status_waiting_reason{app="prometheus",chart="prometheus-6.3.0",component="kube-state-metrics",container="foo",heritage="Tiller",instance="100.97.57.7:8080",job="kubernetes-service-endpoints",kubernetes_name="my-release-prometheus-kube-state-metrics",kubernetes_namespace="prometheus",namespace="jung",pod="foo-6db855bd79-wb2rs",reason="ErrImagePull",release="my-release"}
kube-state-metric reports that my pod is now in "ErrImagePull" state (as expected)
My problem is that this status does not persist more than 1 or 2 minutes, because if I refresh my query, I have a "no data" response while my deployment is still in ImagePullBackOff state.
Is it a normal behaviour?
Thank you for your help
          Yes this is the correct behavior, because kube-state-metrics always reflects the state of the Kubernetes API, so because the Kubernetes API is changing the state of the Pod, so is the metric of the respective Pod.
You can solve this by using max_over_time in combination with a for statement in an alerting rule. That way you can do things like: "if this pod has had the ErrImagePull condition looking back 5minutes for 20minutes"
          ok, but when you say that kube-state-metrics always reflect the state of the K8S API I don't really understand why my pod state is not stucked in "ImagePullBackOff" state.
Because when I do a kubectl get pod on my cluster I always have :
foo-6db855bd79-lznmk                     0/1       ImagePullBackOff   0          14m
So why kube-state-metrics does not always report this status too?
          ah ok .... I'am using k8s.gcr.io/kube-state-metrics:v1.2.0. Need to update ;)
Edit : That solved the problem!
Thank you