I run my typescript application in a docker container. Sometimes it happens, that the application finishes, but the container is still in the running state. How is this even possible ? I was thinking that I forgot to catch a rejected Promise or forgot to close a stream, but in that case, docker top myContainer would say me, that the main process is still running, right ?
FROM node:8
WORKDIR /v2x_communication
COPY . /v2x_communication
RUN npm install && npm run build
ENTRYPOINT ["npm", "run"]
Describe the results you received:
I observed the points below, which led me to open the issue.
The container is still under docker ps:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a55228e756a8 filiprydzi/v2x_communication "npm run start ether…" 2 hours ago Up 2 hours vehicle8
Docker top shows, that there is no running process in the container.
docker top vehicle8
UID PID PPID C STIME TTY TIME CMD
Inspect says, my container is still running
docker inspect vehicle8
"Id": "a55228e756a8132b226308cfde0920e4b188af7e7e63503493523b6154598f5e",
"Created": "2019-01-05T14:44:11.368198346Z",
"Path": "npm",
"Args": [
"run",
"start",
"ethereum",
"172.21.0.9:8545",
"run-producer",
"100",
"0x0f743640f4b8c2ba5be9dc3a792c0262584bfc3c"
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 24420,
"ExitCode": 0,
"Error": "",
"StartedAt": "2019-01-05T14:47:42.050501448Z",
"FinishedAt": "0001-01-01T00:00:00Z"
If I try to execute a command within the container it says the following:
docker exec -i vehicle8 echo 'hello world'
cannot exec in a stopped state: unknown
When I execute docker stop vehicle8
, I see this in docker daemon logs:
level=debug msg="Calling GET /_ping"
level=debug msg="Calling POST /v1.39/containers/vehicle8/stop"
level=debug msg="Sending kill signal 15 to container a55228e756a8132b226308cfde0920e4b188af7e7e63503493523b6154598f5e"
level=debug msg="container kill failed because of 'container not found' or 'no such process'" action=kill container=a55228e756a81
htop
doesn't show the process as well.
Describe the results you expected:
Stopped container should not be listed under docker ps.
Additional information you deem important (e.g. issue happens only occasionally):
Output of docker version
:
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:49:01 2018
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.0
API version: 1.39 (minimum version 1.12)
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:16:44 2018
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
docker info
Containers: 96
Running: 45
Paused: 0
Stopped: 51
Images: 150
Server Version: 18.09.0
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: pending
NodeID: sah8xlcjnxbq13uofznqrjs6e
Is Manager: false
Node Address: 10.132.0.5
Manager Addresses:
10.132.0.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-1026-gcp
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 94.41GiB
Name: vehicle-fleet-big-1
ID: OAVG:6QVR:EH3F:OYNO:ADC4:QDAN:R2AF:LSSV:2VSI:IJWJ:PJH2:LJVP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 379
Goroutines: 340
System Time: 2019-01-05T17:10:38.322898981Z
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
Running in a Google Compute Engine's VM.
We had similar issues,
it happened when system is heavy, some processes might be killed.
We could use below commands to simulate the heavy scenario.
# dockerd-> containerd -> containerd-shim -> <runc_container_process>
$ docker run xxx
$ docker ps ## correct, shows one container is up
# simulate the heavy scenario
$ kill -STOP <dockerd_PID>
$ kill <docker-containerd-shim_PID> or kill <runc_container_process_PID>
$ kill <docker-containerd_PID>
$ kill -CONT <dockerd_PID>
$ docker ps ## wrong, still shows one container is up
Even docker-containerd-shim been killed, the containerd could handle the container status correctly.
But dockerd didn't correct handle the process exit.
The docker status inconsistency between ps (/proc)
and docker ps (docker memdb)
.
$ docker version
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.4
Git commit: 6247962
Built: Tue Feb 26 23:44:34 2019
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.4
Git commit: 6247962
Built: Wed Feb 13 00:25:20 2019
OS/Arch: linux/amd64
Experimental: false
We also test on docker 1.13.
In docker 1.13, containerd-ctr could check docker container status.
containerd can handle containerd-shim, even it is created by other containerd.
I ran into similar issues.
docker log
Jun 17 14:46:47 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:47.731287887+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container b6ecd20df63be055447528489a2b5d0942938f584ea92f12eb3cb6393e9b965a: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:47 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:47.731449788+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container d19d15c98ef62d475d1d48379717f732349b9839da265317717011b7dc54d95a: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:47 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:47.731595895+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 8c02bfc9c15f86429adadf1b14298710b8e3f90f794c756bb0706ee32e2e7663: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.731506794+08:00" level=info msg="Container b6ecd20df63be055447528489a2b5d0942938f584ea92f12eb3cb6393e9b965a failed to exit within 2 seconds of signal 15 - using the force"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.731618833+08:00" level=info msg="Container d19d15c98ef62d475d1d48379717f732349b9839da265317717011b7dc54d95a failed to exit within 2 seconds of signal 15 - using the force"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.731931740+08:00" level=info msg="Container 8c02bfc9c15f86429adadf1b14298710b8e3f90f794c756bb0706ee32e2e7663 failed to exit within 2 seconds of signal 15 - using the force"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.732038322+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container b6ecd20df63be055447528489a2b5d0942938f584ea92f12eb3cb6393e9b965a: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.732092026+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container d19d15c98ef62d475d1d48379717f732349b9839da265317717011b7dc54d95a: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:49 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:49.732224875+08:00" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 8c02bfc9c15f86429adadf1b14298710b8e3f90f794c756bb0706ee32e2e7663: rpc error: code = 2 desc = containerd: container not found"
Jun 17 14:46:50 192-168-2-140 dockerd[792]: 127.0.0.1 - - [17/Jun/2019 06:46:50] "GET / HTTP/1.1" 200 -
Jun 17 14:46:53 192-168-2-140 dockerd[792]: 2019/06/17 06:46:53.879026 metrics.go:39: INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=9 libbeat.logstash.publish.read_bytes=54 libbeat.logstash.publish.write_bytes=4931 libbeat.logstash.published_and_acked_events=34 libbeat.publisher.messages_in_worker_queues=34 libbeat.publisher.published_events=34
Jun 17 14:46:59 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:59.732246659+08:00" level=info msg="Container b6ecd20df63b failed to exit within 10 seconds of kill - trying direct SIGKILL"
Jun 17 14:46:59 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:59.732246431+08:00" level=info msg="Container d19d15c98ef6 failed to exit within 10 seconds of kill - trying direct SIGKILL"
Jun 17 14:46:59 192-168-2-140 dockerd[792]: time="2019-06-17T14:46:59.732366320+08:00" level=info msg="Container 8c02bfc9c15f failed to exit within 10 seconds of kill - trying direct SIGKILL"
docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Experimental: false
kernel version:
Linux 192-168-2-140 4.4.0-142-generic #168-Ubuntu
I also deployed MongoDB service on the same node. When docker is unable to handle the container status correctly, MongoDB service will get restarted constantly.
MongoDB's systemd log showed that it restarted by receiving SIGKILL(I've checked the syslog and confirmed that MongoDB was not killed by system OOM killer).
I used systemtap tool to find out that MongoDB was killed by dockerd process.
stap sigkill.stp -x 15418 SIGKILL
SIGKILL was sent to mongod (pid:15418) by dockerd uid:0
sigkill.stp
#! /usr/bin/env stap
probe signal.send {
if (sig_name == "SIGKILL")
printf("%s was sent to %s (pid:%d) by %s uid:%d\n",
sig_name, pid_name, sig_pid, execname(), uid())
After I restarted docker service, MongoDB service was back to normal.
I believe MongoDB's restart and dockerd are definitely related.
I'm seeing the same issue, sporadically about once every week or two.
The container shows no processes running in docker top
, but appears as running in docker inspect
and docker ps
, and cannot be stopped or killed besides killing the containerd-shim
process.
Logs: https://gist.github.com/evandam/4e8fe37b43252c9c5883e632b66b500d
Docker stacktrace: https://gist.github.com/evandam/e6c9dd30b4d1a5dd280b4a5348bcc5b6
# docker version
Client: Docker Engine - Community
Version: 19.03.8
API version: 1.40
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:25:46 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:24:19 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
I encountered this for a few times within k8s. ctr tasks
showed that the task is STOPPED
while docker ps
thought it's up.
$ docker ps | grep 8b7
8b7ca876b691 ca7e473bb121 "/start.sh controller" 2 days ago Up 2 days k8s_ovn-controller_default-host-9v5x9_onecloud_729d6a6f-4c9f-4832-8834-17f58f2c8990_2
$ ctr -n moby c ls | grep 8b7
8b7ca876b691149f9163b74c01f272af4e15dbf1300c098e1e53a1c4f10a722b - io.containerd.runtime.v1.linux
$ ctr -n moby t ls | grep 8b7
8b7ca876b691149f9163b74c01f272af4e15dbf1300c098e1e53a1c4f10a722b 82674 STOPPED
$ ctr -n moby t exec -t --exec-id yy 8b7ca876b691149f9163b74c01f272af4e15dbf1300c098e1e53a1c4f10a722b /bin/sh
ctr: cannot exec in a stopped state: unknown
Docker version
$ docker version
Client:
Version: 18.09.1
API version: 1.39
Go version: go1.10.6
Git commit: 4c52b90
Built: Wed Jan 9 19:35:01 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.1
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 4c52b90
Built: Wed Jan 9 19:06:30 2019
OS/Arch: linux/amd64
Experimental: false
Containerd version
$ ctr version
Client:
Version: 1.2.10
Revision: b34a5c8af56e510852c35414db4c1f4fa6172339
Server:
Version: 1.2.10
Revision: b34a5c8af56e510852c35414db4c1f4fa6172339
Are you still seeing that problem on a current (19.03 at time of writing) version of Docker? ISTR there were fixes in this area, so possibly it's resolved already. If you're unable to upgrade to 19.03, at least make sure you have the latest patch release for 18.09 installed (https://github.com/moby/moby/releases/tag/v18.09.9); 18.09.1 is really outdated and various CVE's were fixed in later patch releases
With the sudden new comments on this old issue, I'm wondering if there's a separate problem. Upstream containerd fixed a similar issue recently, possibly related?
containerd/containerd#4054
Are you still seeing that problem on a current (19.03 at time of writing) version of Docker? ISTR there were fixes in this area, so possibly it's resolved already. If you're unable to upgrade to 19.03, at least make sure you have the latest patch release for 18.09 installed (https://github.com/moby/moby/releases/tag/v18.09.9); 18.09.1 is really outdated and various CVE's were fixed in later patch releases
Observed the issue just now against docker-ce 19.03.9
[root@yunion32002 ~]# docker version
Client: Docker Engine - Community
Version: 19.03.9
API version: 1.40
Go version: go1.13.10
Git commit: 9d988398e7
Built: Fri May 15 00:25:27 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.9
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 9d988398e7
Built: Fri May 15 00:24:05 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
[root@yunion32002 ~]# ctr -n moby c ls | grep 56b322730696
56b322730696eb78fe9e2f9534ae44560c23b01536b5ed46312eba48a71548d3 - io.containerd.runtime.v1.linux
[root@yunion32002 ~]# ctr -n moby t ls | grep 56b322730696
56b322730696eb78fe9e2f9534ae44560c23b01536b5ed46312eba48a71548d3 30700 STOPPED
[root@yunion32002 ~]# docker ps | grep 56b322730696
56b322730696 ca7e473bb121 "/start.sh controller" 13 days ago Up 13 days k8s_ovn-controller_default-host-4s5r8_onecloud_f15dbbc7-7357-4983-bf1d-e759eb33ca9c_0
Not sure if there are any updates or what the course of action is? Will a newer version of Docker include the containerd patch for this?
Docker currently does not maintain a fork of containerd (and builds packages from the upstream releases), so it depends wether or not that patch makes it into a 1.2.x patch release. (Assuming that patch fixes this issue)
Not sure if there are any updates or what the course of action is? Will a newer version of Docker include the containerd patch for this?
Docker currently does not maintain a fork of containerd (and builds packages from the upstream releases), so it depends wether or not that patch makes it into a 1.2.x patch release. (Assuming that patch fixes this issue)
I can confirm that at the bug scene containerd-shim remains, and its subprocesses gone.
[root@yunion32002 ~]# ps auxwwf | grep -A1 56b322730696
root 67764 0.0 0.0 112708 996 pts/4 S+ 22:04 0:00 \_ grep --color=auto -A1 56b322730696
root 5094 0.0 0.6 435652 50256 ? Ssl May30 18:16 /usr/sbin/rsyslogd -n
root 30664 0.0 0.0 109096 2824 ? Sl Jun04 1:00 \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/56b322730696eb78fe9e2f9534ae44560c23b01536b5ed46312eba48a71548d3 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
root 25203 0.0 0.0 109096 6124 ? Sl Jun08 9:38 \_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/455bae35d225f9658e215cc87db91020b0f2ee46d8430dc5bb0ed2607ff4752d -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc -systemd-cgroup
[root@yunion32002 ~]# ls -l /proc/30664/exe
lrwxrwxrwx 1 root root 0 Jun 17 22:04 /proc/30664/exe -> /usr/bin/containerd-shim
[root@yunion32002 ~]# rpm -qf /usr/bin/containerd-shim
containerd.io-1.2.13-3.2.el7.x86_64
As for containerd/containerd#4054 (or the backport containerd/containerd#4055), I think the situation is different in that according to "docker ps" output, the container had been running for 13 days and the issue happened only when we tried to put it down. The said pull request however is about terminating containerd-shim process when any error happened on containerd task creation.
Seen this scenario with containerd: 1.4.0 09814d48d50816305a8e6c1a4ae3e2bcc4ba725a and dockerd: version 19.03.6-ce, build 369ce74.
containerd-shim for a container X be in S state, docker ps |grep X the container is running, but cant exec into it since its stopped, docker inspect also shows it as running, docker top showed no process in it, ps aux showed no such process. dockerd log shows "trying to kill it SIGKILL" basically forever. There is nothing special about this container, no volumes, weird networking just vanilla container.
However, just cat ing the fifo at /var/lib/containerd/io.containerd.runtime.v1.linux/moby/$containerId/shim.stdout.log
makes the containerd-shim write in the journalctl log "shim reaped" and docker ps no longer shows it as running.
Why does dockerd not drain the fifo stdout and stderr logs? dockerd logs do contain error log "stream copy error: reading from a closed fifo"
Stopped container is shown in docker ps and is unresponsive due to blocked attached output reader
#41827