$ uname -a
Linux iZ2ze43t8c42mqytqholpuZ 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ grep CONFIG_USER_NS /boot/config-3.10.0-693.11.6.el7.x86_64
CONFIG_USER_NS=y
BTW, if user namespace is disabled, should it always fail or just for sometimes.
@cui-liqiang make sure to go through this: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html-single/getting_started_with_containers/index#user_namespaces_options
the kernel boot parameter and the kernel para
meter both needs to be set
Hi @frezbo
I checked the link. As I understand, it talks about enabling user namespaces mapping. I am actually not using this feature. Do I still need to follow the steps in the link?
My docker daemon options:
$ systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since 五 2018-02-02 18:06:12 CST; 3 weeks 2 days ago
Docs: https://docs.docker.com
Main PID: 11877 (dockerd)
Memory: 17.7G
CGroup: /system.slice/docker.service
├─11877 /usr/bin/dockerd
├─11889 docker-containerd --config /var/run/docker/containerd/containerd.toml
└─19417 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1139c89d09ff03e8f01daf4d37b58c97146dc58e0a81b640438d717d0a2074d2...
Oh, you're using CentOS 7.2! In older RHEL kernel versions they deny creation of mount namespaces inside a user namspace because of an out-of-tree patch. See #1513 -- apparently RHEL 7.5 will fix this.
In fact, looking at this again, this looks like a duplicate of #1513 -- while the issue is that you cannot run Docker with --userns-remap
the underlying problem is the same AFAICS. Can you check whether this command fails:
% sudo unshare -Um
And whether this command works:
% sudo unshare -U
However, this part of the bug report still doesn't make sense to me (the above explanation would make containers always fail to start, I don't understand how it could be probabilistic):
the following error appears from time to time. The Probability is around 5% .
@cyphar Neither works.(The output characters: "failed, invalid arguments")
$ sudo unshare -Um
unshare: unshare 失败: 无效的参数
$ sudo unshare -U
unshare: unshare 失败: 无效的参数
$ cat /etc/docker/daemon.json
"registry-mirrors": ["https://srgc54k8.mirror.aliyuncs.com"]
$ docker info
Containers: 8
Running: 0
Paused: 0
Stopped: 8
Images: 1203
Server Version: 17.12.0-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.26GiB
Name: iZ2ze43t8c42mqytqholpuZ
ID: TNVJ:PNUD:XTYK:WZGD:HLST:VZG3:JT5A:UNFM:JVDY:VVDK:Y4HR:S22Y
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://srgc54k8.mirror.aliyuncs.com/
Live Restore Enabled: false
It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m
command.
After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches
, the problem disappears.
markoj, wangycc, uphy, Makoehle, gormed, diannaowa, xyfigo, DeoLeung, boddumanohar, aspekt112, and 24 more reacted with thumbs up emoji
diannaowa, cyphar, boddumanohar, aspekt112, javafoot, teodorescuserban, and xchenhao reacted with laugh emoji
Makoehle, diannaowa, hansongjing, boddumanohar, aspekt112, javafoot, silabeer, and imlabeeb reacted with hooray emoji
playgamelxh reacted with rocket emoji
All reactions
It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m
command.
After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches
, the problem disappears.
I use gitlab CI to build my own App image on gitlab runner, and I get the same problem. Thank you for your answer and I run echo 1 > /proc/sys/vm/drop_caches
in the gitlab runner server, and it works!
For me it was a memory allocation error as described here - https://serverfault.com/questions/236170/page-allocation-failure-am-i-running-out-of-memory
Mine was a 24GB RAM server with over 15GB allocated to page cache and only 600-800MB of free RAM. I noticed my docker failed to start containers if the "free" memory would drop below 1GB, so I set my vm.min_free_kbytes
to 1GB:
#change value for this boot
sysctl -w vm.min_free_kbytes=1048576
#change value for subsequent boots
echo "vm.min_free_kbytes=1048576" >> /etc/sysctl.conf
now it will allocate less to page cache and I won't have to continuously purge it with drop_caches
.
Hope it helps anyone.
EDIT:
Forget the above, it's a dirty hack which might lead to other issues on the system. I restarted the server over a month ago. The restarted had reset the fragmented memory and there hasn't been any issues since.
we have been fighting an issue where this was the main error we observed. Please look at the total system memory vs cat /proc/meminfo | grep Commit
. when swap is disabled we see the CommitLimit being half of the server memory. we are currently testing the parameters: vm.overcommit_memory=2 && vm.overcommit_ratio=200
Hey, we have that issue as well and its related to kernel memory fragmentation in Centos/RHEL:
cat /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
Node 0, zone DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.968 0.984 0.992 0.996
Node 0, zone Normal -1.000 -1.000 0.747 0.874 0.937 0.969 0.985 0.993 0.997 0.999 0.999
having the same issue as given below:
:/home/ubuntu# docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Unable to find image 'nvidia/cuda:9.0-base' locally
9.0-base: Pulling from nvidia/cuda
9ff7e2e5f967: Pull complete
59856638ac9f: Pull complete
6f317d6d954b: Pull complete
a9dde5e2a643: Pull complete
3dab314fc51e: Pull complete
1a4e7e8b3753: Pull complete
388ed6e4a282: Pull complete
Digest: sha256:09ee586c314e599f7b82317fccfbf4717e037e5b83a9c9a9d7a5ccfe810a3071
Status: Downloaded newer image for nvidia/cuda:9.0-base
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=13505 /var/lib/docker/overlay2/88efd338b570e5d09dd7f18fd0b508f962b0060bcdbb448a538b43f9b0b50b66/merged]\\nnvidia-container-cli: initialization error: cuda error: invalid device ordinal\\n\""": unknown.
Most of this errors that I found are related to bad mounting of volumes.
in my case I was mounting a file to a folder
- ./kibana/config.yml:/usr/share/kibana/config/
It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m
command.
After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches
, the problem disappears.
I'm using minikube witht he same issue. I tried ssh into minikube and tried this command bust was denied even with sudo. Any workarounds here that you would recommend?
Thanks!
Most of this errors that I found are related to bad mounting of volumes.
in my case I was mounting a file to a folder
- ./kibana/config.yml:/usr/share/kibana/config/
I was sharing folders and forgot the slash at the end
It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m
command.
After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches
, the problem disappears.
Can anyone please help me how do I stop cleaning cache once after running this echo 1 > /proc/sys/vm/drop_caches