添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time #1740 Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time #1740 cui-liqiang opened this issue Feb 24, 2018 · 26 comments centos 7.2

uname -a:
Linux iZ2ze43t8c42mqytqholpuZ 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

docker -v
Docker version 17.12.0-ce, build c97c6d6

docker-runc -v
runc version 1.0.0-rc4+dev
commit: b2567b3
spec: 1.0.0

When I run docker run or docker build , the following error appears from time to time. The Probability is around 5% .

docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown.

Any clues?

@teddyking I checked this:

$ uname -a
Linux iZ2ze43t8c42mqytqholpuZ 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ grep CONFIG_USER_NS /boot/config-3.10.0-693.11.6.el7.x86_64
CONFIG_USER_NS=y

BTW, if user namespace is disabled, should it always fail or just for sometimes.

@cui-liqiang make sure to go through this: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html-single/getting_started_with_containers/index#user_namespaces_options

the kernel boot parameter and the kernel para
meter both needs to be set

Hi @frezbo
I checked the link. As I understand, it talks about enabling user namespaces mapping. I am actually not using this feature. Do I still need to follow the steps in the link?

My docker daemon options:

$ systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since 五 2018-02-02 18:06:12 CST; 3 weeks 2 days ago
     Docs: https://docs.docker.com
 Main PID: 11877 (dockerd)
   Memory: 17.7G
   CGroup: /system.slice/docker.service
           ├─11877 /usr/bin/dockerd
           ├─11889 docker-containerd --config /var/run/docker/containerd/containerd.toml
           └─19417 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1139c89d09ff03e8f01daf4d37b58c97146dc58e0a81b640438d717d0a2074d2...
          

Oh, you're using CentOS 7.2! In older RHEL kernel versions they deny creation of mount namespaces inside a user namspace because of an out-of-tree patch. See #1513 -- apparently RHEL 7.5 will fix this.

In fact, looking at this again, this looks like a duplicate of #1513 -- while the issue is that you cannot run Docker with --userns-remap the underlying problem is the same AFAICS. Can you check whether this command fails:

% sudo unshare -Um

And whether this command works:

% sudo unshare -U

However, this part of the bug report still doesn't make sense to me (the above explanation would make containers always fail to start, I don't understand how it could be probabilistic):

the following error appears from time to time. The Probability is around 5% .

@cyphar Neither works.(The output characters: "failed, invalid arguments")

$ sudo unshare -Um
unshare: unshare 失败: 无效的参数
$ sudo unshare -U
unshare: unshare 失败: 无效的参数
$ cat /etc/docker/daemon.json
  "registry-mirrors": ["https://srgc54k8.mirror.aliyuncs.com"]
$ docker info
Containers: 8
 Running: 0
 Paused: 0
 Stopped: 8
Images: 1203
Server Version: 17.12.0-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.26GiB
Name: iZ2ze43t8c42mqytqholpuZ
ID: TNVJ:PNUD:XTYK:WZGD:HLST:VZG3:JT5A:UNFM:JVDY:VVDK:Y4HR:S22Y
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Registry Mirrors:
 https://srgc54k8.mirror.aliyuncs.com/
Live Restore Enabled: false
          

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

markoj, wangycc, uphy, Makoehle, gormed, diannaowa, xyfigo, DeoLeung, boddumanohar, aspekt112, and 24 more reacted with thumbs up emoji diannaowa, cyphar, boddumanohar, aspekt112, javafoot, teodorescuserban, and xchenhao reacted with laugh emoji Makoehle, diannaowa, hansongjing, boddumanohar, aspekt112, javafoot, silabeer, and imlabeeb reacted with hooray emoji playgamelxh reacted with rocket emoji All reactions

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

I use gitlab CI to build my own App image on gitlab runner, and I get the same problem. Thank you for your answer and I run echo 1 > /proc/sys/vm/drop_caches in the gitlab runner server, and it works!

For me it was a memory allocation error as described here - https://serverfault.com/questions/236170/page-allocation-failure-am-i-running-out-of-memory

Mine was a 24GB RAM server with over 15GB allocated to page cache and only 600-800MB of free RAM. I noticed my docker failed to start containers if the "free" memory would drop below 1GB, so I set my vm.min_free_kbytes to 1GB:

#change value for this boot
sysctl -w vm.min_free_kbytes=1048576
#change value for subsequent boots
echo "vm.min_free_kbytes=1048576" >> /etc/sysctl.conf

now it will allocate less to page cache and I won't have to continuously purge it with drop_caches.

Hope it helps anyone.

EDIT:

Forget the above, it's a dirty hack which might lead to other issues on the system. I restarted the server over a month ago. The restarted had reset the fragmented memory and there hasn't been any issues since.

we have been fighting an issue where this was the main error we observed. Please look at the total system memory vs cat /proc/meminfo | grep Commit. when swap is disabled we see the CommitLimit being half of the server memory. we are currently testing the parameters: vm.overcommit_memory=2 && vm.overcommit_ratio=200

Hey, we have that issue as well and its related to kernel memory fragmentation in Centos/RHEL:

cat /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 
Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.968 0.984 0.992 0.996 
Node 0, zone   Normal -1.000 -1.000 0.747 0.874 0.937 0.969 0.985 0.993 0.997 0.999 0.999 
          

having the same issue as given below:

:/home/ubuntu# docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Unable to find image 'nvidia/cuda:9.0-base' locally
9.0-base: Pulling from nvidia/cuda
9ff7e2e5f967: Pull complete
59856638ac9f: Pull complete
6f317d6d954b: Pull complete
a9dde5e2a643: Pull complete
3dab314fc51e: Pull complete
1a4e7e8b3753: Pull complete
388ed6e4a282: Pull complete
Digest: sha256:09ee586c314e599f7b82317fccfbf4717e037e5b83a9c9a9d7a5ccfe810a3071
Status: Downloaded newer image for nvidia/cuda:9.0-base
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=13505 /var/lib/docker/overlay2/88efd338b570e5d09dd7f18fd0b508f962b0060bcdbb448a538b43f9b0b50b66/merged]\\nnvidia-container-cli: initialization error: cuda error: invalid device ordinal\\n\""": unknown.

Most of this errors that I found are related to bad mounting of volumes.

in my case I was mounting a file to a folder

- ./kibana/config.yml:/usr/share/kibana/config/

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

I'm using minikube witht he same issue. I tried ssh into minikube and tried this command bust was denied even with sudo. Any workarounds here that you would recommend?
Thanks!

Most of this errors that I found are related to bad mounting of volumes.

in my case I was mounting a file to a folder

- ./kibana/config.yml:/usr/share/kibana/config/

I was sharing folders and forgot the slash at the end

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

Can anyone please help me how do I stop cleaning cache once after running this echo 1 > /proc/sys/vm/drop_caches