-- Logs begin at Mon 2023-10-30 17:34:13 UTC. --
Rancher Server logs
2023/10/30 18:20:27 [ERROR] error syncing 'c-m-<REDACTED>': handler cluster-deploy: apiserver not ready, requeuing
2023/10/30 18:20:27 [INFO] [planner] rkecluster fleet-default/jkeslar-k3s-new: configuring bootstrap node(s) jkeslar-k3s-new-pool1-<REDACTED>: error applying plan -- check rancher-system-agent.service logs on node for more information, waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
2023/10/30 18:20:38 [ERROR] error syncing 'fleet-default/jkeslar-k3s-new-bootstrap-template-<REDACTED>': handler rke-bootstrap-cluster-name: apiserver not ready, requeuing
W1030 18:21:39.268748 38 warnings.go:80] cluster.x-k8s.io/v1alpha3 MachineSet is deprecated; use cluster.x-k8s.io/v1beta1 MachineSet
W1030 18:22:23.206043 38 warnings.go:80] cluster.x-k8s.io/v1alpha3 Machine is deprecated; use cluster.x-k8s.io/v1beta1 Machine
2023/10/30 18:22:27 [ERROR] error syncing 'c-m-<REDACTED>': handler cluster-deploy: apiserver not ready, requeuing
2023/10/30 18:22:27 [INFO] [planner] rkecluster fleet-default/jkeslar-k3s-new: configuring bootstrap node(s) jkeslar-k3s-new-pool1-<REDACTED>: error applying plan -- check rancher-system-agent.service logs on node for more information, waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
2023/10/30 18:22:38 [ERROR] error syncing 'fleet-default/jkeslar-k3s-new-bootstrap-template-<REDACTED>': handler rke-bootstrap-cluster-name: apiserver not ready, requeuing
team/hostbusters
The team that is responsible for provisioning/managing downstream clusters + K8s version support
labels
Oct 30, 2023
This is currently believed to be caused by #43230, which was an intended fix for #43097. This fix was intended to move the etcd safe member removal to before a node is drained, effectively removing it as a participating etcd member, but allowing it to remain as a node in the cluster. However, since k3s does not run etcd as a static pod, this appears to be causing issues wherein removing it from etcd prior to draining causes a complete node failure and prevents the safe node removal from ever succeeding correctly.
With revert PR #43330 merged, this will be available to be tested on v2.8-head
once https://drone-publish.rancher.io/rancher/rancher/10922 (or later build) passes. This issue can be moved "To Test" after that.
Edit: Rerunning CI - https://drone-publish.rancher.io/rancher/rancher/10925/1/1
@Josh-Diamond , ready to test now since the build passed. If the issue is no longer reproducible, please close it and also remove the milestone as technically this issue was not present in any released version so it doesn't seem right to close it with v2.8.0
milestone as it wasn't "fixed" in that milestone. FYI @daviswill2 @Jono-SUSE-Rancher
The following scenario was successfully executed on 5 clusters:
Fresh install of Rancher v2.8-head
Provision a single-node all-roles downstream K3s AWS Node driver w/ k8s v1.27.7+k3s1
Once active
, take a manual snapshot
Once captured, scale up cluster by 1
Verified - cluster successfully scales up; total nodes now 2
Scale up the cluster by 1, once more
Verified - cluster successfully scales up; total nodes now 3
Scale down the cluster by 1
Verified - cluster successfully scales down; total nodes now 2
Scale down cluster by 1, once more
Verified - cluster successfully scales down; as expected
kind/bug
Issues that are defects reported by users or that we know have reached a real release
status/release-blocker
team/hostbusters
The team that is responsible for provisioning/managing downstream clusters + K8s version support