PodのStatusは、
PodStatus
というObjectで、その中に以下のFieldがある (公式:
Pod Lifecycle
)
おおまかに以下のようになっている。 Documentには
phase
,
conditions
,
container status
についての記載がある。
status:
conditions:
- <conditionがArrayで入っている>
containerStatuses:
- <containerのStatusesが入っている>
hostIP: <hostIp>
phase: <phase>
podIP: <podIp>
qosClass: <qosClass>
startTime: "2019-12-17T11:51:17Z"
重要な部分は、
podのphase
とstatus
(condition
の中のfield) は異なる
containerにはstate
とready
という状態を表すfieldがある
1. Phase
Phaseは以下の5種類のみ
Pending
Running
Succeeded
Failed
Unknown
2. Conditions
Arrayで以下の項目で構成される
lastProbeTime
lastTransitionTime
status
message
reason
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-12-17T11:51:17Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-01-24T09:58:21Z"
message: 'containers with unready status: [web-app]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
3. ContainerStatuses
ContainerのstatusもArrayである
Stateは以下の3つのみ
Waiting
Running
Terminated
Stateの中に message
とreason
がある
containerStatuses:
- containerID: docker://<hash>
image: /path/to/image/web-app:tag
imageID: docker-pullable:///path/to/image/web-app:tag@sha256:<hash>
lastState:
terminated:
containerID: docker://<hash>
exitCode: 0
finishedAt: "2020-01-24T09:58:20Z"
reason: Completed
startedAt: "2020-01-24T09:58:19Z"
name: web-app
ready: false
restartCount: 1148
state:
waiting:
message: Back-off 5m0s restarting failed container=web-app pod=web-app-<hash>_<namespace>(<hash>)
reason: CrashLoopBackOff
4. get podした時に出てくる STATUSは?
普段良く使う kubectl get pod
で見るSTATUSは、単純に上記のどれかのFieldと対応していない! (詳細は、Kubernetes: kubectl 上の Pod のステータス表記について
のページがとても詳しいので是非)
kubectl get pod <pod_name> -n <namespace>
NAME READY STATUS RESTARTS AGE
<pod_name> 0/1 CrashLoopBackOff 1152 37d
チェック: Multi-Containerで一つのcontainerのReadinessProbe とLivenessProbeがずっとReadyにならないケース
kubectl get pod
で状態を観察した
Status:Running Ready:1/2 ↔ Status:CrashLoopBackOff Ready:1/2 このふたつの状態を繰り返していた。Readyとならないcontainerはずっとrestartされていたが、もう一つのコンテナ(今回nginxを使用)は問題なく動いていた
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: nginx-deployment
spec:
strategy:
rollingUpdate:
maxUnavailable: 0
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
resources:
limits:
memory: "20Mi"
requests:
memory: "20Mi"
ports:
- containerPort: 80
- name: memory-demo-ctr
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
image: polinux/stress
resources:
limits:
memory: "15Mi"
requests:
memory: "10Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "10M", "--vm-hang", "1"]
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-dsdk2:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-dsdk2
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 39m default-scheduler Successfully
assigned naka/nginx-deployment-669b897679-msndc to ip-192-168-4-159.ap-northeast-1.compute.internal
Normal Pulled 39m kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Container ima
ge "nginx:1.7.9" already present on machine
Normal Created 39m kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Created conta
iner nginx
Normal Started 39m kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Started conta
iner nginx
Normal Killing 38m kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Container mem
ory-demo-ctr failed liveness probe, will be restarted
Normal Pulling 38m (x2 over 39m) kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Pulling image
"polinux/stress"
Normal Pulled 38m (x2 over 39m) kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Successfully
pulled image "polinux/stress"
Normal Created 38m (x2 over 39m) kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Created conta
iner memory-demo-ctr
Normal Started 38m (x2 over 39m) kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Started conta
iner memory-demo-ctr
Warning Unhealthy 37m (x8 over 39m) kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Readiness pro
be failed: dial tcp 192.168.4.201:8080: connect: connection refused
Warning Unhealthy 9m56s (x32 over 39m) kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Liveness prob
e failed: dial tcp 192.168.4.201:8080: connect: connection refused
Warning BackOff 4m52s (x79 over 28m) kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal Back-off rest
arting failed container
NAME READY STATUS RESTARTS AG
nginx-deployment-669b897679-kq84s 1/2 CrashLoopBackOff 13 42
nginx-deployment-669b897679-msndc 1/2 CrashLoopBackOff 13 42
OOMKilled
次にOOMKilledとは、Containerが ResourceLimit
で設定されたMemoryの値以上にメモリを使おうとした時に、killされたということ
Configure Out of Resource Handling
if a Pod container is OOM killed, it may be restarted by the kubelet based on its RestartPolicy.
Pod is running and has one Container. Container runs out of memory.
Container terminates in failure.
Log OOM event.
If restartPolicy is:
Always: Restart Container; Pod phase stays Running.
OnFailure: Restart Container; Pod phase stays Running.
Never: Log failure event; Pod phase becomes Failed.
チェック: Multi-Containerで一つのcontainerが常にOOMKilledされるケース
ContainerCreating -> OOMKilled -> CrashLoopBackOff -> OOMKilled ... STAUSは繰り返し (exponential back-off) が、もう一つのcontainerは問題なく動いている
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: nginx-deployment
spec:
strategy:
rollingUpdate:
maxUnavailable: 0
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
resources:
limits:
memory: "20Mi"
requests:
memory: "20Mi"
ports:
- containerPort: 80
- name: memory-demo-ctr
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
image: polinux/stress
resources:
limits:
memory: "15Mi"
requests:
memory: "10Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "15M", "--vm-hang", "1"]
kubectl get pod -n <namespace>
NAME READY STATUS RESTARTS AG
nginx-deployment-6bd9c88968-dssp4 1/2 CrashLoopBackOff 11 35
nginx-deployment-6bd9c88968-xfmx6 1/2 CrashLoopBackOff 11 35
一つのPodをdescribeしてみる
kubectl describe pod <pod-name> -n <namespace>
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-dsdk2:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-dsdk2
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 33m default-scheduler Successfully assigned naka/nginx-deployment-6bd9c88968-xfmx6 to ip-192-168-5-113.ap-northeast-1.compute.internal
Normal Pulled 33m kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal Container image "nginx:1.7.9" already present on machine
Normal Created 33m kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal Created container nginx
Normal Started 33m kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal Started container nginx
Normal Pulling 32m (x4 over 33m) kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal Pulling image "polinux/stress"
Normal Pulled 32m (x4 over 33m) kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal Successfully pulled image "polinux/stress"
Normal Created 32m (x4 over 33m) kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal Created container memory-demo-ctr
Normal Started 32m (x4 over 33m) kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal Started container memory-demo-ctr
Warning BackOff 3m52s (x145 over 33m) kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal Back-off restarting failed container
Service
コンテナごとにReadinessProbeが設定できるので、Multi-containerのPodで一つContainerがサービスが続くかと思っていた。が、違う。
With the introduction of new Pod conditions, a Pod is evaluated to be ready only when both the following statements are true:
All containers in the Pod are ready.
All conditions specified in ReadinessGates are “True”.
Serviceを利用する場合は、すべてのコンテナがreadyでないといけない。上記の例では、nginxコンテナは問題なく動作していたとしても、serviceを介してアクセスしている場合には、 同居しているcontainerが死ぬと、 そのpodがreadyではなくなり、serviceのendpointから外されてしまう。
以下のように二組 Deploymentと Serviceを用意する.
tree practice/resource/02
practice/resource/02
├── README.md
├── mulit-container-oom.yaml
├── mulit-container.yaml
├── service-oom.yaml
└── service.yaml
nginxとOOMされるコンテナを同居させたdeploymentとnginxのservice
nginxとReadyになるコンテナを同居させたdeploymentとnginxのservice
practice/resource/02/mulit-container-oom.yaml
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: nginx-oom
namespace: naka
spec:
strategy:
rollingUpdate:
maxUnavailable: 0
selector:
matchLabels:
app: nginx-oom
replicas: 2
template:
metadata:
labels:
app: nginx-oom
spec:
containers:
- name: nginx
image: nginx:1.7.9
resources:
limits:
memory: "20Mi"
requests:
memory: "20Mi"
ports:
- containerPort: 80
- name: memory-demo-ctr
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
image: polinux/stress
resources:
limits:
memory: "15Mi"
requests:
memory: "10Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "15M", "--vm-hang", "1"]
practice/resource/02/mulit-container.yaml
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: nginx
namespace: naka
spec:
strategy:
rollingUpdate:
maxUnavailable: 0
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
resources:
limits:
memory: "20Mi"
requests:
memory: "20Mi"
ports:
- containerPort: 80
- name: memory-demo-ctr
image: polinux/stress
resources:
limits:
memory: "15Mi"
requests:
memory: "10Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "1M", "--vm-hang", "1"]
kubectl create namespace naka
kubectl apply -n naka -f practice/resource/02/
deployment.apps/nginx-oom created
deployment.apps/nginx created
service/nginx-oom created
service/nginx created
Podを確認
kubectl get pod -n naka
NAME READY STATUS RESTARTS AGE
nginx-7fb468f99f-b252k 2/2 Running 0 13m
nginx-7fb468f99f-rf7x2 2/2 Running 0 13m
nginx-oom-6655d46664-khbr6 1/2 CrashLoopBackOff 7 14m
nginx-oom-6655d46664-v5d6r 1/2 CrashLoopBackOff 7 14m
Serviceの確認
kubectl get svc -n naka
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 10.100.104.244 <none> 80/TCP 15m
nginx-oom ClusterIP 10.100.133.104 <none> 80/TCP 15m
port-forwardで確認
正常に動いている方をport-forward
kubectl port-forward svc/nginx -n naka 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
別windowでcurlを叩いてみる
curl localhost:8080
<!DOCTYPE html>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
</style>
</head>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
正常に動いている
OOMKilledされるcontainerと同居させたほう
kubectl port-forward svc/nginx-oom -n naka 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
curl localhost:8080
<!DOCTYPE html>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
</style>
</head>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
こちらもnginxは動いている
Endpointの確認
kubectl get endpoints -n naka
NAME ENDPOINTS AGE
nginx 192.168.5.22:80,192.168.6.60:80 18m
nginx-oom 18m
ドキュメント通り2つのコンテナがReadyとなってない nginx-oom
では EndpointsにPodのIPがふられていないことがわかる!
別のPodからcurlでサービス通してアクセス
別のPodをたてる
kubectl run curl --image=radial/busyboxplus:curl -i --tty
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
[ root@curl-66bdcf564-g7cl8:/ ]$
正常に動いている方を確認
サービス名を使って名前解決してアクセスできる
[ root@curl-66bdcf564-g7cl8:/ ]$ nslookup nginx
Server: 10.100.0.10
Address 1: 10.100.0.10 kube-dns.kube-system.svc.cluster.local
Name: nginx
Address 1: 10.100.104.244 nginx.naka.svc.cluster.local
[ root@curl-66bdcf564-g7cl8:/ ]$ curl nginx.naka.svc.cluster.local:80
<!DOCTYPE html>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
</style>
</head>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
OOMのほうを確認
EndpointがServiceから削除されてるので、サービスからアクセスが出来ずTimeoutするのが確認できた
nslookup nginx-oom
Server: 10.100.0.10
Address 1: 10.100.0.10 kube-dns.kube-system.svc.cluster.local
Name: nginx-oom
Address 1: 10.100.133.104 nginx-oom.naka.svc.cluster.local
curl --connect-timeout 10 nginx-oom.naka.svc.cluster.local:80
curl: (28) Connection timed out after 10001 milliseconds