添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Description

I run my typescript application in a docker container. Sometimes it happens, that the application finishes, but the container is still in the running state. How is this even possible ? I was thinking that I forgot to catch a rejected Promise or forgot to close a stream, but in that case, docker top myContainer would say me, that the main process is still running, right ?

Steps to reproduce the issue:

  • Build the docker image for a typescript application (I used the dockerfile below)
  • Run container and wait till the application finishes
  • FROM node:8
    WORKDIR /v2x_communication
    COPY . /v2x_communication
    RUN npm install && npm run build
    ENTRYPOINT ["npm", "run"]
    

    Describe the results you received:

    I observed the points below, which led me to open the issue.

  • The container is still under docker ps:
  • docker ps
    CONTAINER ID        IMAGE                          COMMAND                  CREATED             STATUS              PORTS                NAMES
    a55228e756a8        filiprydzi/v2x_communication   "npm run start ether…"   2 hours ago         Up 2 hours                               vehicle8
    
  • Docker top shows, that there is no running process in the container.
  • docker top vehicle8
    UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
    
  • Inspect says, my container is still running
  • docker inspect vehicle8
            "Id": "a55228e756a8132b226308cfde0920e4b188af7e7e63503493523b6154598f5e",
            "Created": "2019-01-05T14:44:11.368198346Z",
            "Path": "npm",
            "Args": [
                "run",
                "start",
                "ethereum",
                "172.21.0.9:8545",
                "run-producer",
                "100",
                "0x0f743640f4b8c2ba5be9dc3a792c0262584bfc3c"
            "State": {
                "Status": "running",
                "Running": true,
                "Paused": false,
                "Restarting": false,
                "OOMKilled": false,
                "Dead": false,
                "Pid": 24420,
                "ExitCode": 0,
                "Error": "",
                "StartedAt": "2019-01-05T14:47:42.050501448Z",
                "FinishedAt": "0001-01-01T00:00:00Z"
    
  • If I try to execute a command within the container it says the following:
  • docker exec -i vehicle8 echo 'hello world'
    cannot exec in a stopped state: unknown
    
  • When I execute docker stop vehicle8, I see this in docker daemon logs:
  • level=debug msg="Calling GET /_ping"
    level=debug msg="Calling POST /v1.39/containers/vehicle8/stop"
    level=debug msg="Sending kill signal 15 to container a55228e756a8132b226308cfde0920e4b188af7e7e63503493523b6154598f5e"
    level=debug msg="container kill failed because of 'container not found' or 'no such process'" action=kill container=a55228e756a81
    
  • htop doesn't show the process as well.
  • Describe the results you expected:

    Stopped container should not be listed under docker ps.

    Additional information you deem important (e.g. issue happens only occasionally):

    Output of docker version:

    Client:
     Version:           18.09.0
     API version:       1.39
     Go version:        go1.10.4
     Git commit:        4d60db4
     Built:             Wed Nov  7 00:49:01 2018
     OS/Arch:           linux/amd64
     Experimental:      false
    Server: Docker Engine - Community
     Engine:
      Version:          18.09.0
      API version:      1.39 (minimum version 1.12)
      Go version:       go1.10.4
      Git commit:       4d60db4
      Built:            Wed Nov  7 00:16:44 2018
      OS/Arch:          linux/amd64
      Experimental:     false
    

    Output of docker info:

    docker info
    Containers: 96
     Running: 45
     Paused: 0
     Stopped: 51
    Images: 150
    Server Version: 18.09.0
    Storage Driver: overlay2
     Backing Filesystem: extfs
     Supports d_type: true
     Native Overlay Diff: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins:
     Volume: local
     Network: bridge host macvlan null overlay
     Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
    Swarm: pending
     NodeID: sah8xlcjnxbq13uofznqrjs6e
     Is Manager: false
     Node Address: 10.132.0.5
     Manager Addresses:
      10.132.0.2:2377
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
    runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
    init version: fec3683
    Security Options:
     apparmor
     seccomp
      Profile: default
    Kernel Version: 4.15.0-1026-gcp
    Operating System: Ubuntu 18.04.1 LTS
    OSType: linux
    Architecture: x86_64
    CPUs: 48
    Total Memory: 94.41GiB
    Name: vehicle-fleet-big-1
    ID: OAVG:6QVR:EH3F:OYNO:ADC4:QDAN:R2AF:LSSV:2VSI:IJWJ:PJH2:LJVP
    Docker Root Dir: /var/lib/docker
    Debug Mode (client): false
    Debug Mode (server): true
     File Descriptors: 379
     Goroutines: 340
     System Time: 2019-01-05T17:10:38.322898981Z
     EventsListeners: 0
    Registry: https://index.docker.io/v1/
    Labels:
    Experimental: false
    Insecure Registries:
     127.0.0.0/8
    Live Restore Enabled: false
    Product License: Community Engine
    WARNING: No swap limit support
    

    Running in a Google Compute Engine's VM.

    We had similar issues,
    it happened when system is heavy, some processes might be killed.

    We could use below commands to simulate the heavy scenario.

    # dockerd-> containerd -> containerd-shim -> <runc_container_process>
    $ docker run xxx
    $ docker ps  ## correct, shows one container is up
    # simulate the heavy scenario
    $ kill -STOP <dockerd_PID>
    $ kill <docker-containerd-shim_PID> or kill <runc_container_process_PID>
    $ kill <docker-containerd_PID>
    $ kill -CONT <dockerd_PID>
    $ docker ps  ## wrong, still shows one container is up
    

    Even docker-containerd-shim been killed, the containerd could handle the container status correctly.
    But dockerd didn't correct handle the process exit.

    The docker status inconsistency between ps (/proc) and docker ps (docker memdb).

    $ docker version
    Client:
     Version:           18.09.2
     API version:       1.39
     Go version:        go1.10.4
     Git commit:        6247962
     Built:             Tue Feb 26 23:44:34 2019
     OS/Arch:           linux/amd64
     Experimental:      false
    Server:
     Engine:
      Version:          18.09.2
      API version:      1.39 (minimum version 1.12)
      Go version:       go1.10.4
      Git commit:       6247962
      Built:            Wed Feb 13 00:25:20 2019
      OS/Arch:          linux/amd64
      Experimental:     false
    

    We also test on docker 1.13.
    In docker 1.13, containerd-ctr could check docker container status.
    containerd can handle containerd-shim, even it is created by other containerd.

    I ran into similar issues.

    docker log

    Jun 17 14:46:47 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:47.731287887+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container b6ecd20df63be055447528489a2b5d0942938f584ea92f12eb3cb6393e9b965a: rpc error: code = 2 desc = containerd: container not found"
    Jun 17 14:46:47 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:47.731449788+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container d19d15c98ef62d475d1d48379717f732349b9839da265317717011b7dc54d95a: rpc error: code = 2 desc = containerd: container not found"
    Jun 17 14:46:47 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:47.731595895+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 8c02bfc9c15f86429adadf1b14298710b8e3f90f794c756bb0706ee32e2e7663: rpc error: code = 2 desc = containerd: container not found"
    Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.731506794+08:00" level=info msg="Container b6ecd20df63be055447528489a2b5d0942938f584ea92f12eb3cb6393e9b965a failed to exit within 2 seconds of signal 15 - using the force"
    Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.731618833+08:00" level=info msg="Container d19d15c98ef62d475d1d48379717f732349b9839da265317717011b7dc54d95a failed to exit within 2 seconds of signal 15 - using the force"
    Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.731931740+08:00" level=info msg="Container 8c02bfc9c15f86429adadf1b14298710b8e3f90f794c756bb0706ee32e2e7663 failed to exit within 2 seconds of signal 15 - using the force"
    Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.732038322+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container b6ecd20df63be055447528489a2b5d0942938f584ea92f12eb3cb6393e9b965a: rpc error: code = 2 desc = containerd: container not found"
    Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.732092026+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container d19d15c98ef62d475d1d48379717f732349b9839da265317717011b7dc54d95a: rpc error: code = 2 desc = containerd: container not found"
    Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.732224875+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 8c02bfc9c15f86429adadf1b14298710b8e3f90f794c756bb0706ee32e2e7663: rpc error: code = 2 desc = containerd: container not found"
    Jun 17 14:46:50 192-168-2-140 dockerd[792]: 127.0.0.1 - - [17/Jun/2019 06:46:50] "GET / HTTP/1.1" 200 -
    Jun 17 14:46:53 192-168-2-140 dockerd[792]: 2019/06/17 06:46:53.879026 metrics.go:39: INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=9 libbeat.logstash.publish.read_bytes=54 libbeat.logstash.publish.write_bytes=4931 libbeat.logstash.published_and_acked_events=34 libbeat.publisher.messages_in_worker_queues=34 libbeat.publisher.published_events=34
    Jun 17 14:46:59 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:59.732246659+08:00" level=info msg="Container b6ecd20df63b failed to exit within 10 seconds of kill - trying direct SIGKILL"
    Jun 17 14:46:59 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:59.732246431+08:00" level=info msg="Container d19d15c98ef6 failed to exit within 10 seconds of kill - trying direct SIGKILL"
    Jun 17 14:46:59 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:59.732366320+08:00" level=info msg="Container 8c02bfc9c15f failed to exit within 10 seconds of kill - trying direct SIGKILL"
    

    docker version

    Client:
     Version:      17.03.2-ce
     API version:  1.27
     Go version:   go1.7.5
     Git commit:   f5ec1e2
     Built:        Tue Jun 27 03:35:14 2017
     OS/Arch:      linux/amd64
    Server:
     Version:      17.03.2-ce
     API version:  1.27 (minimum version 1.12)
     Go version:   go1.7.5
     Git commit:   f5ec1e2
     Built:        Tue Jun 27 03:35:14 2017
     OS/Arch:      linux/amd64
     Experimental: false
    

    kernel version:

    Linux 192-168-2-140 4.4.0-142-generic #168-Ubuntu
    

    I also deployed MongoDB service on the same node. When docker is unable to handle the container status correctly, MongoDB service will get restarted constantly.

    MongoDB's systemd log showed that it restarted by receiving SIGKILL(I've checked the syslog and confirmed that MongoDB was not killed by system OOM killer).

    I used systemtap tool to find out that MongoDB was killed by dockerd process.

    stap sigkill.stp -x 15418 SIGKILL
    SIGKILL was sent to mongod (pid:15418) by dockerd uid:0
    

    sigkill.stp

    #! /usr/bin/env stap
    probe signal.send {
      if (sig_name == "SIGKILL")
        printf("%s was sent to %s (pid:%d) by %s uid:%d\n",
               sig_name, pid_name, sig_pid, execname(), uid())
    

    After I restarted docker service, MongoDB service was back to normal.

    I believe MongoDB's restart and dockerd are definitely related.

    I'm seeing the same issue, sporadically about once every week or two.

    The container shows no processes running in docker top, but appears as running in docker inspect and docker ps, and cannot be stopped or killed besides killing the containerd-shim process.

    Logs: https://gist.github.com/evandam/4e8fe37b43252c9c5883e632b66b500d
    Docker stacktrace: https://gist.github.com/evandam/e6c9dd30b4d1a5dd280b4a5348bcc5b6

    # docker version
    Client: Docker Engine - Community
     Version:           19.03.8
     API version:       1.40
     Go version:        go1.12.17
     Git commit:        afacb8b7f0
     Built:             Wed Mar 11 01:25:46 2020
     OS/Arch:           linux/amd64
     Experimental:      false
    Server: Docker Engine - Community
     Engine:
      Version:          19.03.8
      API version:      1.40 (minimum version 1.12)
      Go version:       go1.12.17
      Git commit:       afacb8b7f0
      Built:            Wed Mar 11 01:24:19 2020
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.2.13
      GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
     runc:
      Version:          1.0.0-rc10
      GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
     docker-init:
      Version:          0.18.0
      GitCommit:        fec3683
              

    I encountered this for a few times within k8s. ctr tasks showed that the task is STOPPED while docker ps thought it's up.

    $ docker ps | grep 8b7
    8b7ca876b691        ca7e473bb121                                                  "/start.sh controller"   2 days ago          Up 2 days                               k8s_ovn-controller_default-host-9v5x9_onecloud_729d6a6f-4c9f-4832-8834-17f58f2c8990_2
    $ ctr -n moby c ls | grep 8b7
    8b7ca876b691149f9163b74c01f272af4e15dbf1300c098e1e53a1c4f10a722b    -        io.containerd.runtime.v1.linux
    $ ctr -n moby t ls | grep 8b7
    8b7ca876b691149f9163b74c01f272af4e15dbf1300c098e1e53a1c4f10a722b    82674     STOPPED
    $ ctr -n moby t exec -t --exec-id yy 8b7ca876b691149f9163b74c01f272af4e15dbf1300c098e1e53a1c4f10a722b /bin/sh
    ctr: cannot exec in a stopped state: unknown
    

    Docker version

    $ docker version
    Client:
     Version:           18.09.1
     API version:       1.39
     Go version:        go1.10.6
     Git commit:        4c52b90
     Built:             Wed Jan  9 19:35:01 2019
     OS/Arch:           linux/amd64
     Experimental:      false
    Server: Docker Engine - Community
     Engine:
      Version:          18.09.1
      API version:      1.39 (minimum version 1.12)
      Go version:       go1.10.6
      Git commit:       4c52b90
      Built:            Wed Jan  9 19:06:30 2019
      OS/Arch:          linux/amd64
      Experimental:     false
    

    Containerd version

    $ ctr version
    Client:
      Version:  1.2.10
      Revision: b34a5c8af56e510852c35414db4c1f4fa6172339
    Server:
      Version:  1.2.10
      Revision: b34a5c8af56e510852c35414db4c1f4fa6172339
              

    Are you still seeing that problem on a current (19.03 at time of writing) version of Docker? ISTR there were fixes in this area, so possibly it's resolved already. If you're unable to upgrade to 19.03, at least make sure you have the latest patch release for 18.09 installed (https://github.com/moby/moby/releases/tag/v18.09.9); 18.09.1 is really outdated and various CVE's were fixed in later patch releases

    With the sudden new comments on this old issue, I'm wondering if there's a separate problem. Upstream containerd fixed a similar issue recently, possibly related?

    containerd/containerd#4054

    Are you still seeing that problem on a current (19.03 at time of writing) version of Docker? ISTR there were fixes in this area, so possibly it's resolved already. If you're unable to upgrade to 19.03, at least make sure you have the latest patch release for 18.09 installed (https://github.com/moby/moby/releases/tag/v18.09.9); 18.09.1 is really outdated and various CVE's were fixed in later patch releases

    Observed the issue just now against docker-ce 19.03.9

    [root@yunion32002 ~]# docker version
    Client: Docker Engine - Community
     Version:           19.03.9
     API version:       1.40
     Go version:        go1.13.10
     Git commit:        9d988398e7
     Built:             Fri May 15 00:25:27 2020
     OS/Arch:           linux/amd64
     Experimental:      false
    Server: Docker Engine - Community
     Engine:
      Version:          19.03.9
      API version:      1.40 (minimum version 1.12)
      Go version:       go1.13.10
      Git commit:       9d988398e7
      Built:            Fri May 15 00:24:05 2020
      OS/Arch:          linux/amd64
      Experimental:     false
     containerd:
      Version:          1.2.13
      GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
     runc:
      Version:          1.0.0-rc10
      GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
     docker-init:
      Version:          0.18.0
      GitCommit:        fec3683
    [root@yunion32002 ~]# ctr -n moby c ls | grep 56b322730696
    56b322730696eb78fe9e2f9534ae44560c23b01536b5ed46312eba48a71548d3    -        io.containerd.runtime.v1.linux
    [root@yunion32002 ~]# ctr -n moby t ls | grep 56b322730696
    56b322730696eb78fe9e2f9534ae44560c23b01536b5ed46312eba48a71548d3    30700     STOPPED
    [root@yunion32002 ~]# docker ps | grep 56b322730696
    56b322730696        ca7e473bb121                                              "/start.sh controller"   13 days ago         Up 13 days                              k8s_ovn-controller_default-host-4s5r8_onecloud_f15dbbc7-7357-4983-bf1d-e759eb33ca9c_0
    

    Not sure if there are any updates or what the course of action is? Will a newer version of Docker include the containerd patch for this?

    Docker currently does not maintain a fork of containerd (and builds packages from the upstream releases), so it depends wether or not that patch makes it into a 1.2.x patch release. (Assuming that patch fixes this issue)

    Not sure if there are any updates or what the course of action is? Will a newer version of Docker include the containerd patch for this?

    Docker currently does not maintain a fork of containerd (and builds packages from the upstream releases), so it depends wether or not that patch makes it into a 1.2.x patch release. (Assuming that patch fixes this issue)

    I can confirm that at the bug scene containerd-shim remains, and its subprocesses gone.

    [root@yunion32002 ~]# ps auxwwf | grep -A1 56b322730696
    root      67764  0.0  0.0 112708   996 pts/4    S+   22:04   0:00          \_ grep --color=auto -A1 56b322730696
    root       5094  0.0  0.6 435652 50256 ?        Ssl  May30  18:16 /usr/sbin/rsyslogd -n
    root      30664  0.0  0.0 109096  2824 ?        Sl   Jun04   1:00  \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/56b322730696eb78fe9e2f9534ae44560c23b01536b5ed46312eba48a71548d3 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
    root      25203  0.0  0.0 109096  6124 ?        Sl   Jun08   9:38  \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/455bae35d225f9658e215cc87db91020b0f2ee46d8430dc5bb0ed2607ff4752d -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
    [root@yunion32002 ~]# ls -l /proc/30664/exe
    lrwxrwxrwx 1 root root 0 Jun 17 22:04 /proc/30664/exe -> /usr/bin/containerd-shim
    [root@yunion32002 ~]# rpm -qf /usr/bin/containerd-shim
    containerd.io-1.2.13-3.2.el7.x86_64
    

    As for containerd/containerd#4054 (or the backport containerd/containerd#4055), I think the situation is different in that according to "docker ps" output, the container had been running for 13 days and the issue happened only when we tried to put it down. The said pull request however is about terminating containerd-shim process when any error happened on containerd task creation.

    Seen this scenario with containerd: 1.4.0 09814d48d50816305a8e6c1a4ae3e2bcc4ba725a and dockerd: version 19.03.6-ce, build 369ce74.

    containerd-shim for a container X be in S state, docker ps |grep X the container is running, but cant exec into it since its stopped, docker inspect also shows it as running, docker top showed no process in it, ps aux showed no such process. dockerd log shows "trying to kill it SIGKILL" basically forever. There is nothing special about this container, no volumes, weird networking just vanilla container.

    However, just cat ing the fifo at /var/lib/containerd/io.containerd.runtime.v1.linux/moby/$containerId/shim.stdout.log
    makes the containerd-shim write in the journalctl log "shim reaped" and docker ps no longer shows it as running.

    Why does dockerd not drain the fifo stdout and stderr logs? dockerd logs do contain error log "stream copy error: reading from a closed fifo"

    Stopped container is shown in docker ps and is unresponsive due to blocked attached output reader #41827