Stopped container is shown in docker ps · Issue #38501 · moby/moby

link管理
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
相关文章推荐
细心的泡面 · Hashicorp Vault : “ ...· 3 周前 ·
发财的太阳 · GitHub - ...· 2 周前 ·
勤奋的鸭蛋 · RUN Instruction Using ...· 2 周前 ·
小眼睛的火车 · How to run Docker ...· 1 周前 ·
谦虚好学的毛衣 · Docker中UseContainerSup ...· 1 周前 ·
热情的签字笔 · 关联理论视角下《格列佛游记》中的反讽研究 ...· 3 月前 ·
叛逆的羊肉串 · 什么是Scrum？ - Scrum中文网· 6 月前 ·
文雅的炒粉 · 查询数据 | TDengine 文档 | 涛思数据· 6 月前 ·
严肃的香菇 · CVE漏洞复现-CVE-2019-5736 ...· 6 月前 ·
爱搭讪的桔子 · 您可以检查代码中是否存在无限循环、递归深度过 ...· 6 月前 ·
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Description
I run my typescript application in a docker container. Sometimes it happens, that the application finishes, but the container is still in the running state. How is this even possible ? I was thinking that I forgot to catch a rejected Promise or forgot to close a stream, but in that case, docker top myContainer would say me, that the main process is still running, right ?
Steps to reproduce the issue:
Build the docker image for a typescript application (I used the dockerfile below)
Run container and wait till the application finishes
FROM node:8
WORKDIR /v2x_communication
COPY . /v2x_communication
RUN npm install && npm run build
ENTRYPOINT ["npm", "run"]
Describe the results you received:
I observed the points below, which led me to open the issue.
The container is still under docker ps:
docker ps
CONTAINER ID        IMAGE                          COMMAND                  CREATED             STATUS              PORTS                NAMES
a55228e756a8        filiprydzi/v2x_communication   "npm run start ether…"   2 hours ago         Up 2 hours                               vehicle8
Docker top shows, that there is no running process in the container.
docker top vehicle8
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
Inspect says, my container is still running
docker inspect vehicle8
        "Id": "a55228e756a8132b226308cfde0920e4b188af7e7e63503493523b6154598f5e",
        "Created": "2019-01-05T14:44:11.368198346Z",
        "Path": "npm",
        "Args": [
            "run",
            "start",
            "ethereum",
            "172.21.0.9:8545",
            "run-producer",
            "100",
            "0x0f743640f4b8c2ba5be9dc3a792c0262584bfc3c"
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 24420,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2019-01-05T14:47:42.050501448Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
If I try to execute a command within the container it says the following:
docker exec -i vehicle8 echo 'hello world'
cannot exec in a stopped state: unknown
When I execute docker stop vehicle8, I see this in docker daemon logs:
level=debug msg="Calling GET /_ping"
level=debug msg="Calling POST /v1.39/containers/vehicle8/stop"
level=debug msg="Sending kill signal 15 to container a55228e756a8132b226308cfde0920e4b188af7e7e63503493523b6154598f5e"
level=debug msg="container kill failed because of 'container not found' or 'no such process'" action=kill container=a55228e756a81
htop doesn't show the process as well.
Describe the results you expected:
Stopped container should not be listed under docker ps.
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version:
Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        4d60db4
 Built:             Wed Nov  7 00:49:01 2018
 OS/Arch:           linux/amd64
 Experimental:      false
Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:16:44 2018
  OS/Arch:          linux/amd64
  Experimental:     false
Output of docker info:
docker info
Containers: 96
 Running: 45
 Paused: 0
 Stopped: 51
Images: 150
Server Version: 18.09.0
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: pending
 NodeID: sah8xlcjnxbq13uofznqrjs6e
 Is Manager: false
 Node Address: 10.132.0.5
 Manager Addresses:
  10.132.0.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-1026-gcp
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 94.41GiB
Name: vehicle-fleet-big-1
ID: OAVG:6QVR:EH3F:OYNO:ADC4:QDAN:R2AF:LSSV:2VSI:IJWJ:PJH2:LJVP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 379
 Goroutines: 340
 System Time: 2019-01-05T17:10:38.322898981Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
Running in a Google Compute Engine's VM.
          We had similar issues,

it happened when system is heavy, some processes might be killed.
We could use below commands to simulate the heavy scenario.
# dockerd-> containerd -> containerd-shim -> <runc_container_process>
$ docker run xxx
$ docker ps  ## correct, shows one container is up
# simulate the heavy scenario
$ kill -STOP <dockerd_PID>
$ kill <docker-containerd-shim_PID> or kill <runc_container_process_PID>
$ kill <docker-containerd_PID>
$ kill -CONT <dockerd_PID>
$ docker ps  ## wrong, still shows one container is up
Even docker-containerd-shim been killed, the containerd could handle the container status correctly.

But dockerd didn't correct handle the process exit.
The docker status inconsistency between ps (/proc) and docker ps (docker memdb).
$ docker version
Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        6247962
 Built:             Tue Feb 26 23:44:34 2019
 OS/Arch:           linux/amd64
 Experimental:      false
Server:
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       6247962
  Built:            Wed Feb 13 00:25:20 2019
  OS/Arch:          linux/amd64
  Experimental:     false
We also test on docker 1.13.

In docker 1.13, containerd-ctr could check docker container status.

containerd can handle containerd-shim, even it is created by other containerd.
          I ran into similar issues.
docker log
Jun 17 14:46:47 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:47.731287887+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container b6ecd20df63be055447528489a2b5d0942938f584ea92f12eb3cb6393e9b965a: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:47 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:47.731449788+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container d19d15c98ef62d475d1d48379717f732349b9839da265317717011b7dc54d95a: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:47 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:47.731595895+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 8c02bfc9c15f86429adadf1b14298710b8e3f90f794c756bb0706ee32e2e7663: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.731506794+08:00" level=info msg="Container b6ecd20df63be055447528489a2b5d0942938f584ea92f12eb3cb6393e9b965a failed to exit within 2 seconds of signal 15 - using the force"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.731618833+08:00" level=info msg="Container d19d15c98ef62d475d1d48379717f732349b9839da265317717011b7dc54d95a failed to exit within 2 seconds of signal 15 - using the force"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.731931740+08:00" level=info msg="Container 8c02bfc9c15f86429adadf1b14298710b8e3f90f794c756bb0706ee32e2e7663 failed to exit within 2 seconds of signal 15 - using the force"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.732038322+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container b6ecd20df63be055447528489a2b5d0942938f584ea92f12eb3cb6393e9b965a: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.732092026+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container d19d15c98ef62d475d1d48379717f732349b9839da265317717011b7dc54d95a: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.732224875+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 8c02bfc9c15f86429adadf1b14298710b8e3f90f794c756bb0706ee32e2e7663: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:50 192-168-2-140 dockerd[792]: 127.0.0.1 - - [17/Jun/2019 06:46:50] "GET / HTTP/1.1" 200 -
Jun 17 14:46:53 192-168-2-140 dockerd[792]: 2019/06/17 06:46:53.879026 metrics.go:39: INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=9 libbeat.logstash.publish.read_bytes=54 libbeat.logstash.publish.write_bytes=4931 libbeat.logstash.published_and_acked_events=34 libbeat.publisher.messages_in_worker_queues=34 libbeat.publisher.published_events=34
Jun 17 14:46:59 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:59.732246659+08:00" level=info msg="Container b6ecd20df63b failed to exit within 10 seconds of kill - trying direct SIGKILL"
Jun 17 14:46:59 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:59.732246431+08:00" level=info msg="Container d19d15c98ef6 failed to exit within 10 seconds of kill - trying direct SIGKILL"
Jun 17 14:46:59 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:59.732366320+08:00" level=info msg="Container 8c02bfc9c15f failed to exit within 10 seconds of kill - trying direct SIGKILL"
docker version




    

Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 03:35:14 2017
 OS/Arch:      linux/amd64
Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 03:35:14 2017
 OS/Arch:      linux/amd64
 Experimental: false
kernel version:
Linux 192-168-2-140 4.4.0-142-generic #168-Ubuntu
I also deployed MongoDB service on the same node. When docker is unable to handle the container status correctly, MongoDB service will get restarted constantly.
MongoDB's systemd log showed that it restarted by receiving SIGKILL(I've checked the syslog and confirmed that MongoDB was not killed by system OOM killer).
I used systemtap tool to find out that MongoDB was killed by dockerd process.
stap sigkill.stp -x 15418 SIGKILL
SIGKILL was sent to mongod (pid:15418) by dockerd uid:0
sigkill.stp
#! /usr/bin/env stap
probe signal.send {
  if (sig_name == "SIGKILL")
    printf("%s was sent to %s (pid:%d) by %s uid:%d\n",
           sig_name, pid_name, sig_pid, execname(), uid())
After I restarted docker service, MongoDB service was back to normal.
I believe MongoDB's restart and dockerd are definitely related.
          I'm seeing the same issue, sporadically about once every week or two.
The container shows no processes running in docker top, but appears as running in docker inspect and docker ps, and cannot be stopped or killed besides killing the containerd-shim process.
Logs: https://gist.github.com/evandam/4e8fe37b43252c9c5883e632b66b500d

Docker stacktrace: https://gist.github.com/evandam/e6c9dd30b4d1a5dd280b4a5348bcc5b6
# docker version
Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:25:46 2020
 OS/Arch:           linux/amd64
 Experimental:      false
Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:19 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
          I encountered this for a few times within k8s.  ctr tasks showed that the task is STOPPED while docker ps thought it's up.
$ docker ps | grep 8b7
8b7ca876b691        ca7e473bb121                                                  "/start.sh controller"   2 days ago          Up 2 days                               k8s_ovn-controller_default-host-9v5x9_onecloud_729d6a6f-4c9f-4832-8834-17f58f2c8990_2
$ ctr -n moby c ls | grep 8b7
8b7ca876b691149f9163b74c01f272af4e15dbf1300c098e1e53a1c4f10a722b    -        io.containerd.runtime.v1.linux
$ ctr -n moby t ls | grep 8b7
8b7ca876b691149f9163b74c01f272af4e15dbf1300c098e1e53a1c4f10a722b    82674     STOPPED
$ ctr -n moby t exec -t --exec-id yy 8b7ca876b691149f9163b74c01f272af4e15dbf1300c098e1e53a1c4f10a722b /bin/sh
ctr: cannot exec in a stopped state: unknown
Docker version
$ docker version
Client:
 Version:           18.09.1
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        4c52b90
 Built:             Wed Jan  9 19:35:01 2019
 OS/Arch:           linux/amd64
 Experimental:      false
Server: Docker Engine - Community
 Engine:
  Version:          18.09.1
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       4c52b90
  Built:            Wed Jan  9 19:06:30 2019
  OS/Arch:          linux/amd64
  Experimental:     false
Containerd version
$ ctr version
Client:
  Version:  1.2.10
  Revision: b34a5c8af56e510852c35414db4c1f4fa6172339
Server:
  Version:  1.2.10
  Revision: b34a5c8af56e510852c35414db4c1f4fa6172339
          Are you still seeing that problem on a current (19.03 at time of writing) version of Docker? ISTR there were fixes in this area, so possibly it's resolved already. If you're unable to upgrade to 19.03, at least make sure you have the latest patch release for 18.09 installed (https://github.com/moby/moby/releases/tag/v18.09.9); 18.09.1 is really outdated and various CVE's were fixed in later patch releases
          With the sudden new comments on this old issue, I'm wondering if there's a separate problem. Upstream containerd fixed a similar issue recently, possibly related?
containerd/containerd#4054
Are you still seeing that problem on a current (19.03 at time of writing) version of Docker? ISTR there were fixes in this area, so possibly it's resolved already. If you're unable to upgrade to 19.03, at least make sure you have the latest patch release for 18.09 installed (https://github.com/moby/moby/releases/tag/v18.09.9); 18.09.1 is really outdated and various CVE's were fixed in later patch releases
Observed the issue just now against docker-ce 19.03.9
[root@yunion32002 ~]# docker version
Client: Docker Engine - Community
 Version:           19.03.9
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        9d988398e7
 Built:             Fri May 15 00:25:27 2020
 OS/Arch:           linux/amd64
 Experimental:      false
Server: Docker Engine - Community
 Engine:
  Version:          19.03.9
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       9d988398e7
  Built:            Fri May 15 00:24:05 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
[root@yunion32002 ~]# ctr -n moby c ls | grep 56b322730696
56b322730696eb78fe9e2f9534ae44560c23b01536b5ed46312eba48a71548d3    -        io.containerd.runtime.v1.linux
[root@yunion32002 ~]# ctr -n moby t ls | grep 56b322730696
56b322730696eb78fe9e2f9534ae44560c23b01536b5ed46312eba48a71548d3    30700     STOPPED
[root@yunion32002 ~]# docker ps | grep 56b322730696
56b322730696        ca7e473bb121                                              "/start.sh controller"   13 days ago         Up 13 days                              k8s_ovn-controller_default-host-4s5r8_onecloud_f15dbbc7-7357-4983-bf1d-e759eb33ca9c_0
Not sure if there are any updates or what the course of action is? Will a newer version of Docker include the containerd patch for this?
Docker currently does not maintain a fork of containerd (and builds packages from the upstream releases), so it depends wether or not that patch makes it into a 1.2.x patch release. (Assuming that patch fixes this issue)
Not sure if there are any updates or what the course of action is? Will a newer version of Docker include the containerd patch for this?
Docker currently does not maintain a fork of containerd (and builds packages from the upstream releases), so it depends wether or not that patch makes it into a 1.2.x patch release. (Assuming that patch fixes this issue)
I can confirm that at the bug scene containerd-shim remains, and its subprocesses gone.
[root@yunion32002 ~]# ps auxwwf | grep -A1 56b322730696
root      67764  0.0  0.0 112708   996 pts/4    S+   22:04   0:00          \_ grep --color=auto -A1 56b322730696
root       5094  0.0  0.6 435652 50256 ?        Ssl  May30  18:16 /usr/sbin/rsyslogd -n
root      30664  0.0  0.0 109096  2824 ?        Sl   Jun04   1:00  \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/56b322730696eb78fe9e2f9534ae44560c23b01536b5ed46312eba48a71548d3 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
root      25203  0.0  0.0 109096  6124 ?        Sl   Jun08   9:38  \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/455bae35d225f9658e215cc87db91020b0f2ee46d8430dc5bb0ed2607ff4752d -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
[root@yunion32002 ~]# ls -l /proc/30664/exe
lrwxrwxrwx 1 root root 0 Jun 17 22:04 /proc/30664/exe -> /usr/bin/containerd-shim
[root@yunion32002 ~]# rpm -qf /usr/bin/containerd-shim
containerd.io-1.2.13-3.2.el7.x86_64
As for containerd/containerd#4054 (or the backport containerd/containerd#4055), I think the situation is different in that according to "docker ps" output, the container had been running for 13 days and the issue happened only when we tried to put it down.  The said pull request however is about terminating containerd-shim process when any error happened on containerd task creation.
          Seen this scenario with containerd: 1.4.0 09814d48d50816305a8e6c1a4ae3e2bcc4ba725a and dockerd: version 19.03.6-ce, build 369ce74.
containerd-shim for a container X be in S state, docker ps |grep X the container is running, but cant exec into it since its stopped, docker inspect also shows it as running, docker top showed no process in it, ps aux showed no such process. dockerd log shows "trying to kill it SIGKILL" basically forever. There is nothing special about this container, no volumes, weird networking just vanilla container.
However, just cat ing the fifo at /var/lib/containerd/io.containerd.runtime.v1.linux/moby/$containerId/shim.stdout.log

makes the containerd-shim write in the journalctl log "shim reaped" and docker ps no longer shows it as running.
Why does dockerd not drain the fifo stdout and stderr logs? dockerd logs do contain error log "stream copy error: reading from a closed fifo"
      Stopped container is shown in docker ps and is unresponsive due to blocked attached output reader
      #41827