添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
1

More than 3 years have passed since last update.

posted at

updated at

KubernetesでMulti-container Podの1つのコンテナがRestartされつづけても別のコンテナには影響しない (Serviceには影響する)

初めにPodとContainerのStatusについて Overview

PodのStatusは、 PodStatus というObjectで、その中に以下のFieldがある (公式: Pod Lifecycle )

  • conditions
  • containerStatuses
  • hostIP
  • phase
  • podIP
  • qosClass
  • startTime
  • おおまかに以下のようになっている。 Documentには phase , conditions , container status についての記載がある。

    status:
      conditions:
      - <conditionがArrayで入っている>
      containerStatuses:
      - <containerのStatusesが入っている>
      hostIP: <hostIp>
      phase: <phase>
      podIP: <podIp>
      qosClass: <qosClass>
      startTime: "2019-12-17T11:51:17Z"
    

    重要な部分は、

  • podのphasestatus (conditionの中のfield) は異なる
  • containerにはstatereadyという状態を表すfieldがある
  • 1. Phase

    Phaseは以下の5種類のみ

  • Pending
  • Running
  • Succeeded
  • Failed
  • Unknown
  • 2. Conditions

    Arrayで以下の項目で構成される

  • lastProbeTime
  • lastTransitionTime
  • status
  • message
  • reason
  • status: conditions: - lastProbeTime: null lastTransitionTime: "2019-12-17T11:51:17Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2020-01-24T09:58:21Z" message: 'containers with unready status: [web-app]' reason: ContainersNotReady status: "False" type: ContainersReady 3. ContainerStatuses

    ContainerのstatusもArrayである

    Stateは以下の3つのみ

  • Waiting
  • Running
  • Terminated
  • Stateの中に messagereasonがある

      containerStatuses:
      - containerID: docker://<hash>
        image: /path/to/image/web-app:tag
        imageID: docker-pullable:///path/to/image/web-app:tag@sha256:<hash>
        lastState:
          terminated:
            containerID: docker://<hash>
            exitCode: 0
            finishedAt: "2020-01-24T09:58:20Z"
            reason: Completed
            startedAt: "2020-01-24T09:58:19Z"
        name: web-app
        ready: false
        restartCount: 1148
        state:
          waiting:
            message: Back-off 5m0s restarting failed container=web-app pod=web-app-<hash>_<namespace>(<hash>)
            reason: CrashLoopBackOff
    4. get podした時に出てくる STATUSは?
    

    普段良く使う kubectl get pod で見るSTATUSは、単純に上記のどれかのFieldと対応していない! (詳細は、Kubernetes: kubectl 上の Pod のステータス表記について
    のページがとても詳しいので是非)

    kubectl get pod <pod_name> -n <namespace>
    NAME         READY   STATUS             RESTARTS   AGE
    <pod_name>   0/1     CrashLoopBackOff   1152       37d
    チェック: Multi-Containerで一つのcontainerのReadinessProbe とLivenessProbeがずっとReadyにならないケース
    

    kubectl get pod で状態を観察した

    Status:Running Ready:1/2 ↔ Status:CrashLoopBackOff Ready:1/2 このふたつの状態を繰り返していた。Readyとならないcontainerはずっとrestartされていたが、もう一つのコンテナ(今回nginxを使用)は問題なく動いていた

    apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
    kind: Deployment
    metadata:
      name: nginx-deployment
    spec:
      strategy:
        rollingUpdate:
          maxUnavailable: 0
      selector:
        matchLabels:
          app: nginx
      replicas: 2
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.7.9
            resources:
              limits:
                memory: "20Mi"
              requests:
                memory: "20Mi"
            ports:
            - containerPort: 80
          - name: memory-demo-ctr
            readinessProbe:
              tcpSocket:
                port: 8080
              initialDelaySeconds: 30
              periodSeconds: 10
            livenessProbe:
              tcpSocket:
                port: 8080
              initialDelaySeconds: 15
              periodSeconds: 20
            image: polinux/stress
            resources:
              limits:
                memory: "15Mi"
              requests:
                memory: "10Mi"
            command: ["stress"]
            args: ["--vm", "1", "--vm-bytes", "10M", "--vm-hang", "1"]
    
    Conditions:
      Type              Status
      Initialized       True
      Ready             False
      ContainersReady   False
      PodScheduled      True
    Volumes:
      default-token-dsdk2:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  default-token-dsdk2
        Optional:    false
    QoS Class:       Burstable
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type     Reason     Age                   From                                                       Message
      ----     ------     ----                  ----                                                       -------
      Normal   Scheduled  39m                   default-scheduler                                          Successfully
    assigned naka/nginx-deployment-669b897679-msndc to ip-192-168-4-159.ap-northeast-1.compute.internal
      Normal   Pulled     39m                   kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Container ima
    ge "nginx:1.7.9" already present on machine
      Normal   Created    39m                   kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Created conta
    iner nginx
      Normal   Started    39m                   kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Started conta
    iner nginx
      Normal   Killing    38m                   kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Container mem
    ory-demo-ctr failed liveness probe, will be restarted
      Normal   Pulling    38m (x2 over 39m)     kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Pulling image
     "polinux/stress"
      Normal   Pulled     38m (x2 over 39m)     kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Successfully
    pulled image "polinux/stress"
      Normal   Created    38m (x2 over 39m)     kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Created conta
    iner memory-demo-ctr
      Normal   Started    38m (x2 over 39m)     kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Started conta
    iner memory-demo-ctr
      Warning  Unhealthy  37m (x8 over 39m)     kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Readiness pro
    be failed: dial tcp 192.168.4.201:8080: connect: connection refused
      Warning  Unhealthy  9m56s (x32 over 39m)  kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Liveness prob
    e failed: dial tcp 192.168.4.201:8080: connect: connection refused
      Warning  BackOff    4m52s (x79 over 28m)  kubelet, ip-192-168-4-159.ap-northeast-1.compute.internal  Back-off rest
    arting failed container
    
    NAME                                READY   STATUS             RESTARTS   AG
    nginx-deployment-669b897679-kq84s   1/2     CrashLoopBackOff   13         42
    nginx-deployment-669b897679-msndc   1/2     CrashLoopBackOff   13         42
    OOMKilled
    

    次にOOMKilledとは、Containerが ResourceLimit で設定されたMemoryの値以上にメモリを使おうとした時に、killされたということ

    Configure Out of Resource Handling

    if a Pod container is OOM killed, it may be restarted by the kubelet based on its RestartPolicy.

    Pod is running and has one Container. Container runs out of memory.

    Container terminates in failure.
    Log OOM event.
    If restartPolicy is:
    Always: Restart Container; Pod phase stays Running.
    OnFailure: Restart Container; Pod phase stays Running.
    Never: Log failure event; Pod phase becomes Failed.

    チェック: Multi-Containerで一つのcontainerが常にOOMKilledされるケース

    ContainerCreating -> OOMKilled -> CrashLoopBackOff -> OOMKilled ... STAUSは繰り返し (exponential back-off) が、もう一つのcontainerは問題なく動いている

    apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
    kind: Deployment
    metadata:
      name: nginx-deployment
    spec:
      strategy:
        rollingUpdate:
          maxUnavailable: 0
      selector:
        matchLabels:
          app: nginx
      replicas: 2
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.7.9
            resources:
              limits:
                memory: "20Mi"
              requests:
                memory: "20Mi"
            ports:
            - containerPort: 80
          - name: memory-demo-ctr
            readinessProbe:
              tcpSocket:
                port: 8080
              initialDelaySeconds: 30
              periodSeconds: 10
            livenessProbe:
              tcpSocket:
                port: 8080
              initialDelaySeconds: 15
              periodSeconds: 20
            image: polinux/stress
            resources:
              limits:
                memory: "15Mi"
              requests:
                memory: "10Mi"
            command: ["stress"]
            args: ["--vm", "1", "--vm-bytes", "15M", "--vm-hang", "1"]
    
    kubectl get pod -n <namespace>
    NAME                                READY   STATUS             RESTARTS   AG
    nginx-deployment-6bd9c88968-dssp4   1/2     CrashLoopBackOff   11         35
    nginx-deployment-6bd9c88968-xfmx6   1/2     CrashLoopBackOff   11         35
    

    一つのPodをdescribeしてみる

    kubectl describe pod <pod-name> -n <namespace>
    Conditions:
      Type              Status
      Initialized       True
      Ready             False
      ContainersReady   False
      PodScheduled      True
    Volumes:
      default-token-dsdk2:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  default-token-dsdk2
        Optional:    false
    QoS Class:       Burstable
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type     Reason     Age                    From                                                       Message
      ----     ------     ----                   ----                                                       -------
      Normal   Scheduled  33m                    default-scheduler                                          Successfully assigned naka/nginx-deployment-6bd9c88968-xfmx6 to ip-192-168-5-113.ap-northeast-1.compute.internal
      Normal   Pulled     33m                    kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal  Container image "nginx:1.7.9" already present on machine
      Normal   Created    33m                    kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal  Created container nginx
      Normal   Started    33m                    kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal  Started container nginx
      Normal   Pulling    32m (x4 over 33m)      kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal  Pulling image "polinux/stress"
      Normal   Pulled     32m (x4 over 33m)      kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal  Successfully pulled image "polinux/stress"
      Normal   Created    32m (x4 over 33m)      kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal  Created container memory-demo-ctr
      Normal   Started    32m (x4 over 33m)      kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal  Started container memory-demo-ctr
      Warning  BackOff    3m52s (x145 over 33m)  kubelet, ip-192-168-5-113.ap-northeast-1.compute.internal  Back-off restarting failed container
    Service
    

    コンテナごとにReadinessProbeが設定できるので、Multi-containerのPodで一つContainerがサービスが続くかと思っていた。が、違う。

    With the introduction of new Pod conditions, a Pod is evaluated to be ready only when both the following statements are true:

  • All containers in the Pod are ready.
  • All conditions specified in ReadinessGates are “True”.
  • Serviceを利用する場合は、すべてのコンテナがreadyでないといけない。上記の例では、nginxコンテナは問題なく動作していたとしても、serviceを介してアクセスしている場合には、 同居しているcontainerが死ぬと、 そのpodがreadyではなくなり、serviceのendpointから外されてしまう。

    以下のように二組 Deploymentと Serviceを用意する.

    tree practice/resource/02 practice/resource/02 ├── README.md ├── mulit-container-oom.yaml ├── mulit-container.yaml ├── service-oom.yaml └── service.yaml
  • nginxとOOMされるコンテナを同居させたdeploymentとnginxのservice
  • nginxとReadyになるコンテナを同居させたdeploymentとnginxのservice
  • practice/resource/02/mulit-container-oom.yaml
    apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
    kind: Deployment
    metadata:
      name: nginx-oom
      namespace: naka
    spec:
      strategy:
        rollingUpdate:
          maxUnavailable: 0
      selector:
        matchLabels:
          app: nginx-oom
      replicas: 2
      template:
        metadata:
          labels:
            app: nginx-oom
        spec:
          containers:
          - name: nginx
            image: nginx:1.7.9
            resources:
              limits:
                memory: "20Mi"
              requests:
                memory: "20Mi"
            ports:
            - containerPort: 80
          - name: memory-demo-ctr
            readinessProbe:
              tcpSocket:
                port: 8080
              initialDelaySeconds: 30
              periodSeconds: 10
            livenessProbe:
              tcpSocket:
                port: 8080
              initialDelaySeconds: 15
              periodSeconds: 20
            image: polinux/stress
            resources:
              limits:
                memory: "15Mi"
              requests:
                memory: "10Mi"
            command: ["stress"]
            args: ["--vm", "1", "--vm-bytes", "15M", "--vm-hang", "1"]
    
    practice/resource/02/mulit-container.yaml
    apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
    kind: Deployment
    metadata:
      name: nginx
      namespace: naka
    spec:
      strategy:
        rollingUpdate:
          maxUnavailable: 0
      selector:
        matchLabels:
          app: nginx
      replicas: 2
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.7.9
            resources:
              limits:
                memory: "20Mi"
              requests:
                memory: "20Mi"
            ports:
            - containerPort: 80
          - name: memory-demo-ctr
            image: polinux/stress
            resources:
              limits:
                memory: "15Mi"
              requests:
                memory: "10Mi"
            command: ["stress"]
            args: ["--vm", "1", "--vm-bytes", "1M", "--vm-hang", "1"]
    
    kubectl create namespace naka
    kubectl apply -n naka -f practice/resource/02/
    deployment.apps/nginx-oom created
    deployment.apps/nginx created
    service/nginx-oom created
    service/nginx created
    Podを確認
    
    kubectl get pod -n naka
    NAME                         READY   STATUS             RESTARTS   AGE
    nginx-7fb468f99f-b252k       2/2     Running            0          13m
    nginx-7fb468f99f-rf7x2       2/2     Running            0          13m
    nginx-oom-6655d46664-khbr6   1/2     CrashLoopBackOff   7          14m
    nginx-oom-6655d46664-v5d6r   1/2     CrashLoopBackOff   7          14m
    Serviceの確認
    
    kubectl get svc -n naka
    NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
    nginx       ClusterIP   10.100.104.244   <none>        80/TCP    15m
    nginx-oom   ClusterIP   10.100.133.104   <none>        80/TCP    15m
    port-forwardで確認
    正常に動いている方をport-forward
    
    kubectl port-forward svc/nginx -n naka 8080:80
    Forwarding from 127.0.0.1:8080 -> 80
    Forwarding from [::1]:8080 -> 80
    

    別windowでcurlを叩いてみる

    curl localhost:8080
    <!DOCTYPE html>
    <title>Welcome to nginx!</title>
    <style>
        body {
            width: 35em;
            margin: 0 auto;
            font-family: Tahoma, Verdana, Arial, sans-serif;
    </style>
    </head>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.</p>
    <p>For online documentation and support please refer to
    <a href="http://nginx.org/">nginx.org</a>.<br/>
    Commercial support is available at
    <a href="http://nginx.com/">nginx.com</a>.</p>
    <p><em>Thank you for using nginx.</em></p>
    </body>
    </html>
    

    正常に動いている

    OOMKilledされるcontainerと同居させたほう
    kubectl port-forward svc/nginx-oom -n naka 8080:80
    Forwarding from 127.0.0.1:8080 -> 80
    Forwarding from [::1]:8080 -> 80
    
    curl localhost:8080
    <!DOCTYPE html>
    <title>Welcome to nginx!</title>
    <style>
        body {
            width: 35em;
            margin: 0 auto;
            font-family: Tahoma, Verdana, Arial, sans-serif;
    </style>
    </head>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.</p>
    <p>For online documentation and support please refer to
    <a href="http://nginx.org/">nginx.org</a>.<br/>
    Commercial support is available at
    <a href="http://nginx.com/">nginx.com</a>.</p>
    <p><em>Thank you for using nginx.</em></p>
    </body>
    </html>
    

    こちらもnginxは動いている

    Endpointの確認
    kubectl get endpoints -n naka
    NAME        ENDPOINTS                         AGE
    nginx       192.168.5.22:80,192.168.6.60:80   18m
    nginx-oom                                     18m
    

    ドキュメント通り2つのコンテナがReadyとなってない nginx-oom では EndpointsにPodのIPがふられていないことがわかる!

    別のPodからcurlでサービス通してアクセス

    別のPodをたてる

    kubectl run curl --image=radial/busyboxplus:curl -i --tty
    kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
    If you don't see a command prompt, try pressing enter.
    [ root@curl-66bdcf564-g7cl8:/ ]$
    正常に動いている方を確認
    

    サービス名を使って名前解決してアクセスできる

    [ root@curl-66bdcf564-g7cl8:/ ]$ nslookup nginx
    Server:    10.100.0.10
    Address 1: 10.100.0.10 kube-dns.kube-system.svc.cluster.local
    Name:      nginx
    Address 1: 10.100.104.244 nginx.naka.svc.cluster.local
    [ root@curl-66bdcf564-g7cl8:/ ]$ curl nginx.naka.svc.cluster.local:80
    <!DOCTYPE html>
    <title>Welcome to nginx!</title>
    <style>
        body {
            width: 35em;
            margin: 0 auto;
            font-family: Tahoma, Verdana, Arial, sans-serif;
    </style>
    </head>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.</p>
    <p>For online documentation and support please refer to
    <a href="http://nginx.org/">nginx.org</a>.<br/>
    Commercial support is available at
    <a href="http://nginx.com/">nginx.com</a>.</p>
    <p><em>Thank you for using nginx.</em></p>
    </body>
    </html>
    OOMのほうを確認
    

    EndpointがServiceから削除されてるので、サービスからアクセスが出来ずTimeoutするのが確認できた

    nslookup nginx-oom
    Server:    10.100.0.10
    Address 1: 10.100.0.10 kube-dns.kube-system.svc.cluster.local
    Name:      nginx-oom
    Address 1: 10.100.133.104 nginx-oom.naka.svc.cluster.local
    curl --connect-timeout 10 nginx-oom.naka.svc.cluster.local:80
    curl: (28) Connection timed out after 10001 milliseconds
    

    Register as a new user and use Qiita more conveniently

    1. You get articles that match your needs
    2. You can efficiently read back useful information
    What you can do with signing up
    Sign up Login
    5
    1