Normal Created 2m kubelet Created container queue-proxy
Normal Started 2m kubelet Started container queue-proxy
Normal Killing 60s kubelet Container user-container failed liveness probe, will be restarted
Warning Unhealthy 60s (x3 over 90s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Started 9s (x2 over 2m) kubelet Started container user-container
Normal Created 9s (x2 over 2m) kubelet Created container user-container
Normal Pulled 9s kubelet Container image "shashankft/healthprobes@sha256:cc1cb4323e4d1cdee62e3ae7623ff6a213db72e89c17a1b98fb9" already present on machine
Warning Unhealthy 0s (x6 over 50s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
I tried to run a k8s deployment with the same image and liveness probe configuration, and it seemed to behave as I hoped, whenever I explicitly hit the /start-failing
path, in some seconds liveness probe will report failure and the container will restart and start serving the traffic again.
I can see the following logs in kubelet:
I0208 01:54:37.906694 12942 http.go:137] Probe failed for http://10.233.65.50:8012/ with request headers map[Accept:[*/*] K-Network-Probe:[queue] User-Agent:[kube-probe/1.20]], response body: shutting down
I0208 01:54:37.906752 12942 prober.go:117] Readiness probe for "healthprobes-go-00001-deployment-55cf484684-dhgct_default(b0570c26-41ea-4156-9367-372f5775415e):queue-proxy" failed (failure): HTTP probe failed with statuscode: 503
In the comments it should be: "503 shutting down". DefaultDrainTimeout = 45s
. K8s defaults for readiness probes are here.
".Spec.Containers[0].ReadinessProbe.FailureThreshold": 3
Knative applies some revision defaults but not in this case (in the ksvc above only liveness probes are set):
serving/pkg/apis/serving/v1/revision_defaults.go
Lines 136 to 158
52f07b0
One thing to consider although it does not seem to be the case is that:
Caution: Liveness probes do not wait for readiness probes to succeed. If you want to wait before executing a liveness probe you should use initialDelaySeconds or a startupProbe.
What happens if you increase the readiness failure threshold to 5? Right now it seems Kubelet will try for 3*10=30 seconds and then fail which is lower than the draining period. That probably explains what you see in the events: 6 failures and 2 container restarts. My rough guess.
hey, thanks for digging in, just to be sure - are you suggesting that I should try changing the readiness failure threshold of the queue-proxy
container? I can see what you are saying if so - basically since the 3 checks end before the draining period, queue-proxy
keeps on giving 503
for the whole duration of those 3 checks, correct?
Another scenario I tested, might be relevant - this time I added a readiness probe in the ksvc:
readinessProbe:
httpGet:
path: /healthz/readiness
port: 0
periodSeconds: 15
failureThreshould: 5
and I expected the same failure log in kubelet, but I got this:
I0209 08:54:41.457686 12942 http.go:137] Probe failed for http://10.233.65.149:8012/ with request headers map[Accept:[*/*] K-Network-Probe:[queue] User-Agent:[kube-probe/1.20]], response body:
I0209 08:54:41.457730 12942 prober.go:117] Readiness probe for "healthprobes-go-00001-deployment-6877485457-cw7lz_default(2f04f8af-f9aa-4bb9-938e-a2a1d5b8cb28):queue-proxy" failed (failure): HTTP probe failed with statuscode: 503
I will try changing the default queue-proxy
readiness probe's failureThreshold
in serving code.
@Shashankft9 I meant the user container let me explain though because eventually the issue is different. I debug the scenario
with the following settings (using your code):
readinessProbe:
httpGet:
path: /healthz/readiness
port: 0
periodSeconds: 15
failureThreshould: 5
livenessProbe:
httpGet:
path: /healthz/liveness
port: 0
periodSeconds: 15
failureThreshould: 1
Here is what happens after a while:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
healthprobes-go-00001-deployment-74f6787d8f-sz6mq 1/2 Running 1 3m43s
pod descr
Normal Scheduled 5m48s default-scheduler Successfully assigned default/healthprobes-go-00001-deployment-74f6787d8f-sz6mq to minikube
Normal Pulling 5m46s kubelet Pulling image "shashankft/healthprobes@sha256:cc1cb4323e4d1cdee62e3ae7623ff6a213db72e89c17ad4772c39fe841b98fb9"
Normal Pulled 5m28s kubelet Successfully pulled image "shashankft/healthprobes@sha256:cc1cb4323e4d1cdee62e3ae7623ff6a213db72e89c17ad4772c39fe841b98fb9" in 17.522624817s
Normal Pulling 5m27s kubelet Pulling image "docker.io/skonto/queue-39be6f1d08a095bd076a71d288d295b6@sha256:c75f19f83cddb57026535d0069689760554e867c32aea20e51850cc9b894fbd1"
Normal Pulled 5m15s kubelet Successfully pulled image "docker.io/skonto/queue-39be6f1d08a095bd076a71d288d295b6@sha256:c75f19f83cddb57026535d0069689760554e867c32aea20e51850cc9b894fbd1" in 12.338262103s
Normal Created 5m14s kubelet Created container queue-proxy
Normal Started 5m14s kubelet Started container queue-proxy
Normal Killing 4m33s kubelet Container user-container failed liveness probe, will be restarted
Warning Unhealthy 4m33s (x3 over 5m3s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Pulled 4m3s kubelet Container image "shashankft/healthprobes@sha256:cc1cb4323e4d1cdee62e3ae7623ff6a213db72e89c17ad4772c39fe841b98fb9" already present on machine
Normal Started 4m2s (x2 over 5m27s) kubelet Started container user-container
Normal Created 4m2s (x2 over 5m28s) kubelet Created container user-container
Warning Unhealthy 33s (x19 over 5m3s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
So if you check the link above, implementation makes sure readiness is set only to queue container on behalf of the user container while liveness is set on the user container, although all http probe calls at first go through to queue proxy anyway.
This is important because the only container that is restarted is the user container that after the initial call to /start-failing path
it will be restarted due to the liveness failure. Once it is restarted liveness check will not fail anymore. However, the queue container is stuck in the draining phase and readiness probes will fail until the pod is unhealthy and eventually it will be removed. Here is why (btw here are the latest entries of the kubelet log):
the drain function will only reset the timer if a normal call is made, otherwise it will keep returning "shutting down" to probes (that cant happen because the unhealthy container cannot be accessed).
serving/vendor/knative.dev/pkg/network/handlers/drain.go
Lines 88 to 111
52f07b0
I modified the func above by adding some comments and as you can see from the queue proxy logs it is stuck.
Btw readiness probes are managed by queue proxy which every 50ms will try poll the user container trying the readiness url passed. It respects the readiness probe settings passed and these are passed via an env var to the queue container which is later used to set up the logic of the prober:
SERVING_READINESS_PROBE: {"httpGet":{"path":"/healthz/readiness","port":8080,"host":"127.0.0.1","scheme":"HTTP","httpHeaders":[{"name":"K-Kubelet-Probe","value":"queue"}]},"timeoutSeconds":1,"periodSeconds":15,"successThreshold":1,"failureThreshold":3}
/cc @julz is this behavior expected (by design)?
never returns false, since once the d.timer
is set, its never being set to nil again and so the queue-proxy is staying in drain mode forever because this condition is true always.
If the idea of drain mode is to be there for some QuietPeriod
which by default is 45s
, would it make sense to actually add d.timer = nil
after the time period has elapsed? after this line:
serving/vendor/knative.dev/pkg/network/handlers/drain.go
Line 129
52f07b0
The Quiet period is overriden at the queue proxy side:
The knative/pkg sets this by default to be 45 here:
serving/vendor/knative.dev/pkg/network/handlers/drain.go
Line 123
52f07b0
https://github.com/knative/serving/blob/52f07b02254b42f18fa0f71dbb0462410a5bc7b1/cmd/queue/main.go
Some more context:
The drain call here is what we experience.
serving/cmd/queue/main.go
Line 354
52f07b0
Line 16 in the queue proxy logs:
https://gist.github.com/skonto/749b5a34b464a786d2d3f473da0453d2
This drain is called because we use preStop handlers on the user container.
serving/pkg/reconciler/revision/resources/deploy.go
Line 80
9fb7372
The user container was restarted.
Kubernetes sends the preStop event immediately before the Container is terminated. Kubernetes' management of the Container blocks until the preStop handler completes, unless the Pod's grace period expires.
So we have a blocking situation because preStop returns at some point (as described next) actually but does not set the condition to be non draining
or timer set to nil.
Now when we drain for the first time we set a timer, that timer is continously being reset if there is
any user request in the mean time. This is the key idea:
// When the Drainer is told to Drain, it will immediately start to fail
// probes with a "500 shutting down", and the call will block until no
// requests have been received for QuietPeriod (defaults to
// network.DefaultDrainTimeout).
Now if we set that timer to nil after draining is done I suspect that we will be able to move on with readiness probes.
The invariant is that pods should finish the requests before shutting down and in our case it should be draining all requests before restarting and not accepting any new before container is ready.
I am wondering if there is a design doc for the behavior.
We are also running on this issue, and were wondering if there was any recent update ? ( @Shashankft9 )
Thanks.
/triage accepted
Thanks for the deep breakdown everyone - I think what we really need is a clear signal to distinguish between the Pod shutting down and either container (queue-proxy/user-container) restarting.
I'm not sure that's straightforward. Will dig into this for the next release.
I worry about the setting the timer to nil - because I believe in the case the Pod is shutting down it may let a request through rather than returning 503
as expected
/assign @dprotaso
It would be very interesting to know how google cloud got around this issue.
I tried a quick setup of cloud run for anthos and I have seen that it does not present this issue.
Anyway, on cloud run for anthos they provide their own queue-proxy: gke.gcr.io/knative/queue@sha256:2148fc4af5f5cda0594d97c3027b316bcba3c8f400d3d57087807f90e80641d3
I tried to use this queue-proxy image on my knative 1.3.1 in self managed cluster ad it seems to solve the issue.
I hope this could helps
Hey @faxioman - the 1.6 release is today. I'll be taking a look at this for 1.7.
It would be very interesting to know how google cloud got around this issue.
Google's Cloud Run is a separate implementation and they've stopped contributing to upstream
<!-- Thanks for sending a pull request! Here are some tips for you:
1. Run unit tests and ensure that they are passing
2. If your change introduces any API changes, make sure to update the
e2e tests
3. Make sure documentation is updated for your PR!
**What this PR does / why we need it**:
<!-- Explain here the context and why you're making the change. What is
the problem you're trying to solve. --->
Update end to end test to use knative 1.7.4 to test its compatibility
against Merlin and kserve 0.9.0. The PR also update the k8s version that
is used for testing from 1.21 to 1.22.
We need to update to knative 1.7.4 because current version (1.3.3)
suffers from issue knative/serving#12571
**Does this PR introduce a user-facing change?**:
If no, just write "NONE" in the release-note block below.
If yes, a release note is required. Enter your extended release note in
the block below.
If the PR requires additional action from users switching to the new
release, include the string "action required".
For more information about release notes, see kubernetes' guide here:
http://git.k8s.io/community/contributors/guide/release-notes.md
**Checklist**
- [X] Added unit test, integration, and/or e2e tests
- [X] Tested locally