  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account [BUG] Unable to provision downstream k8s v1.25.2-rancher1-1 cluster using Oracle Linux 8.6 #39988 [BUG] Unable to provision downstream k8s v1.25.2-rancher1-1 cluster using Oracle Linux 8.6 #39988 rishabhmsra opened this issue Dec 23, 2022 · 5 comments kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
  • Rancher version: v2.7-head( 5398b07 )
  • Installation option (Docker install/Helm Chart): Helm Chart
  • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE1, v1.24.8-rancher1-1
  • Information about the Cluster

  • Kubernetes version: v1.25.2-rancher1-1
  • Cluster Type (Local/Downstream): Downstream
  • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Custom
  • User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) Admin
  • If custom, define the set of permissions:
  • Describe the bug

  • Provisioned k8s v1.25.2-rancher1-1 k8s cluster with OL 8.6 (used ami-0131316000f02f99f), cluster came into Active state, but after some time it goes into Error state with following error: Cluster health check failed: Failed to communicate with API server during namespace check: Get "https://<REDACTED>:6443/api/v1/namespaces/kube-system?timeout=45s": context deadline exceeded
  • Also after the error, not able to SSH into control plane node.
  • There's an issue logged for high CPU usage on OL 8.6 with k8s v1.24.
  • To Reproduce

  • Create 3 VMs on aws using OL 8.6 ami(in this case ami-0131316000f02f99f is used)
  • Open the ports and run the firewall commands as mentioned here .
  • Create custom cluster(1-cp, 1-e, 1-w) by running the registration commands on the nodes and wait for cluster to come into Active state.
  • After some time cluster will go into Error state with error: Failed to communicate with API server during namespace check: Get "https://:6443/api/v1/namespaces/kube-system?timeout=45s": context deadline exceeded`
  • Result

  • Cluster fails to provision using OL 8.6 ami
  • Expected Result

  • Cluster should provision successfully and come into Active state.
  • team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support labels Dec 23, 2022

    @sowmyav27 , k8s v1.24 with OL 8.6 already has a CPU usage issue for which below issues are logged:

  • Unable to create k8s v1.24 downstream clusters using Oracle Linux 8.4 #38214 (comment)
  • [BUG] Abnormally high CPU usage on Kubernetes 1.24.4 #38816
  • Yes I'm seeing same healthcheck error with OL 8.6 with k8s 1.24:

  • Created a k8s v1.24.8-rancher1-1 custom cluster using the steps as mentioned here
  • Wait for cluster to come into Active state and then observer after sometime cluster will go into Error state with following error:
    Cluster health check failed: Failed to communicate with API server during namespace check: Get "https://<REDACTED>:6443/api/v1/namespaces/kube-system?timeout=45s": context deadline exceeded
  • Wait for some more time and observe that cluster will go into Unavailable state with error: Cluster agent is not connected
  • docker.service logs:
  • docker.service - Docker Application Container Engine
       Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
       Active: active (running) since Thu 2023-01-05 13:35:25 GMT; 28min ago
         Docs: https://docs.docker.com
     Main PID: 11781 (dockerd)
        Tasks: 66
       Memory: 2.1G
       CGroup: /system.slice/docker.service
               └─11781 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
    time="2023-01-05T13:51:36.631002475Z" level=error msg="Not continuing with pull after error: context canceled"
    time="2023-01-05T13:56:55.537716495Z" level=error msg="Handler for GET /v1.41/containers/b8a2e57d0060c0dc9067ea5796668f1678c85e0e13bd071aa4e7c80890628ded/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
    http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11)
    time="2023-01-05T13:57:29.060444869Z" level=error msg="Handler for GET /v1.41/containers/bb29e9dd266aa3fda19405fdf715317230a161f2d465fb58e3c9f54ee705dccc/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
    time="2023-01-05T13:57:29.432542906Z" level=error msg="Handler for GET /v1.41/containers/8f2adce05071064c9be961270c05d2d7dfebff1ef5055932b866dd782a3fea03/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
    http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11)
    http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11)
    time="2023-01-05T13:58:21.956038247Z" level=error msg="Handler for GET /v1.41/containers/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
    http: superfluous response.WriteHeader call from github.com/docker/docker/api/server/httputils.WriteJSON (httputils_write_json.go:11)
    time="2023-01-05T13:58:29.271516967Z" level=error msg="exit event" container=2d5a3ab4e391cf84cfcc312181749bb6c10bbd08017d2b26f0889536e671e4e7 error="no such exec" module=libcontainerd namespace=moby process=bccfc45bf28efe6aa58765af789ae8c500766b06e2204600c3ba61f590951c73
  • Kubelet logs on the CP node:
  • time="2023-01-05T13:52:27Z" level=error msg="operation timeout: context deadline exceeded Failed to get stats from container bb29e9dd266aa3fda19405fdf715317230a161f2d465fb58e3c9f54ee705dccc" time="2023-01-05T13:53:45Z" level=error msg="unable to inspect docker image \"sha256:59daef946c8c6f1a1152d05726e87a4677e8a196ab3045249faad95181f6fafa\" while inspecting docker container \"9cc33f7c5355b8fcfcbb5d1542954c0e3fd925a718c08717b0f4b27543b3f53b\": operation timeout: context deadline exceeded Failed to get stats from container 9cc33f7c5355b8fcfcbb5d1542954c0e3fd925a718c08717b0f4b27543b3f53b"

    "ExecSync cmd from runtime service failed" err="rpc error: code = Unknown desc = deadline exceeded (\"DeadlineExceeded\"): context deadline exceeded" containerID="2d5a3ab4e391cf84cfcc312181749bb6c10bbd08017d2b26f0889536e671e4e7" cmd=[/bin/calico-node -felix-live]

  • Also after the error, not able to SSH into control plane node.
  • kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support