[BUG] Unable to provision downstream k8s v1.25.2-rancher1-1 cluster using Oracle Linux 8.6

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

聪明伶俐的大熊猫 · 局域网内部署 Docker ...· 2 周前 ·

英姿勃勃的鸡蛋面 · 前端知识讲座笔记（sourcemap、doc ...· 1 周前 ·

重情义的甘蔗 · Solved: How edit file ...· 1 周前 ·

踢足球的洋葱 · 风中呓语· 3 天前 ·

粗眉毛的牛排 · Docker：安裝適用於 Linux 上的 ...· 昨天 ·

追风的大象 · ASP.NET ...· 3 月前 ·

痴情的大象 · JavaScript 技术篇 - js ...· 5 月前 ·

文质彬彬的斑马 · 南阳市开元国际学校怎么样，招生及收费情况 ...· 5 月前 ·

腼腆的柠檬 · 讲座 | ...· 6 月前 ·

朝气蓬勃的绿豆 · 辽宁省人民政府办公厅关于印发辽宁省 ...· 6 月前 ·

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account [BUG] Unable to provision downstream k8s v1.25.2-rancher1-1 cluster using Oracle Linux 8.6 #39988 [BUG] Unable to provision downstream k8s v1.25.2-rancher1-1 cluster using Oracle Linux 8.6 #39988 rishabhmsra opened this issue Dec 23, 2022 · 5 comments kind/bug-qa team/hostbusters

Rancher version: v2.7-head( 5398b07 )

Installation option (Docker install/Helm Chart): Helm Chart

If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE1, v1.24.8-rancher1-1

Information about the Cluster

Kubernetes version: v1.25.2-rancher1-1

Cluster Type (Local/Downstream): Downstream

If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Custom

User Information

What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) Admin

If custom, define the set of permissions:

Describe the bug

Provisioned k8s v1.25.2-rancher1-1 k8s cluster with OL 8.6 (used ami-0131316000f02f99f), cluster came into Active state, but after some time it goes into Error state with following error:


    Cluster health check failed: Failed to communicate with API server during namespace check: Get "https://<REDACTED>:6443/api/v1/namespaces/kube-system?timeout=45s": context deadline exceeded

Also after the error, not able to SSH into control plane node.

There's an issue logged for high CPU usage on OL 8.6 with k8s v1.24.

To Reproduce

Create 3 VMs on aws using OL 8.6 ami(in this case ami-0131316000f02f99f is used)

Open the ports and run the firewall commands as mentioned here .

Create custom cluster(1-cp, 1-e, 1-w) by running the registration commands on the nodes and wait for cluster to come into Active state.

After some time cluster will go into Error state with error: Failed to communicate with API server during namespace check: Get "https://:6443/api/v1/namespaces/kube-system?timeout=45s": context deadline exceeded`

Result

Cluster fails to provision using OL 8.6 ami

Expected Result

Cluster should provision successfully and come into Active state.

team/hostbusters labels

Dec 23, 2022

@sowmyav27 , k8s v1.24 with OL 8.6 already has a CPU usage issue for which below issues are logged:

Unable to create k8s v1.24 downstream clusters using Oracle Linux 8.4 #38214 (comment)

[BUG] Abnormally high CPU usage on Kubernetes 1.24.4 #38816

Yes I'm seeing same healthcheck error with OL 8.6 with k8s 1.24:

Created a


    k8s v1.24.8-rancher1-1

custom cluster using the steps as mentioned here

Wait for cluster to come into Active state and then observer after sometime cluster will go into Error state with following error:


    Cluster health check failed: Failed to communicate with API server during namespace check: Get "https://<REDACTED>:6443/api/v1/namespaces/kube-system?timeout=45s": context deadline exceeded

Wait for some more time and observe that cluster will go into Unavailable state with error:


    Cluster agent is not connected

docker.service logs:

docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-01-05 13:35:25 GMT; 28min ago
     Docs: https://docs.docker.com
 Main PID: 11781 (dockerd)
    Tasks: 66
   Memory: 2.1G
   CGroup: /system.slice/docker.service
           └─11781 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
time="2023-01-05T13:51:36.631002475Z" level=error msg="Not continuing with pull after error: context canceled"
time="2023-01-05T13:56:55.537716495Z" level=error msg="Handler for GET /v1.41/containers/b8a2e57d0060c0dc9067ea5796668f1678c85e0e13bd071aa4e7c80890628ded/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11)
time="2023-01-05T13:57:29.060444869Z" level=error msg="Handler for GET /v1.41/containers/bb29e9dd266aa3fda19405fdf715317230a161f2d465fb58e3c9f54ee705dccc/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
time="2023-01-05T13:57:29.432542906Z" level=error msg="Handler for GET /v1.41/containers/8f2adce05071064c9be961270c05d2d7dfebff1ef5055932b866dd782a3fea03/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11)
http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11)
time="2023-01-05T13:58:21.956038247Z" level=error msg="Handler for GET /v1.41/containers/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11)
time="2023-01-05T13:58:29.271516967Z" level=error msg="exit event" container=2d5a3ab4e391cf84cfcc312181749bb6c10bbd08017d2b26f0889536e671e4e7 error="no such exec" module=libcontainerd namespace=moby process=bccfc45bf28efe6aa58765af789ae8c500766b06e2204600c3ba61f590951c73
Kubelet logs on the CP node:
time="2023-01-05T13:52:27Z" level=error msg="operation timeout: context deadline exceeded Failed to get stats from container bb29e9dd266aa3fda19405fdf715317230a161f2d465fb58e3c9f54ee705dccc" time="2023-01-05T13:53:45Z" level=error msg="unable to inspect docker image \"sha256:59daef946c8c6f1a1152d05726e87a4677e8a196ab3045249faad95181f6fafa\" while inspecting docker container \"9cc33f7c5355b8fcfcbb5d1542954c0e3fd925a718c08717b0f4b27543b3f53b\": operation timeout: context deadline exceeded Failed to get stats from container 9cc33f7c5355b8fcfcbb5d1542954c0e3fd925a718c08717b0f4b27543b3f53b"
"ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = deadline exceeded (\"DeadlineExceeded\"): context deadline exceeded" containerID="2d5a3ab4e391cf84cfcc312181749bb6c10bbd08017d2b26f0889536e671e4e7" cmd=[/bin/calico-node -felix-live]
Also after the error, not able to SSH into control plane node.
    kind/bug-qa
  Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement
    team/hostbusters
  The team that is responsible for provisioning/managing downstream clusters + K8s version support