New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase timeout for pod lifecycle test to reach pod status=ready #96691
Conversation
@hh: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/release-note-none |
/sig architecture |
/kind flake |
/test pull-kubernetes-e2e-gce-100-performance
|
ref #96565 |
@@ -59,7 +59,7 @@ const ( | |||
maxBackOffTolerance = time.Duration(1.3 * float64(kubelet.MaxContainerBackOff)) | |||
podRetryPeriod = 1 * time.Second | |||
podRetryTimeout = 1 * time.Minute | |||
podReadyTimeout = 1 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worthwhile figuring out why this started timing out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see this once, we do see the Pod was scheduled quite quickly, but never progressed to Running.
It may have been a slow pull, but it's not clear that we can see that via the logs.
Nov 18 13:28:04.312: INFO: observed Pod pod-test in namespace pods-5897 in phase Pending conditions []
Nov 18 13:28:04.360: INFO: observed Pod pod-test in namespace pods-5897 in phase Pending conditions
[{PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-11-18 13:28:04 +0000 UTC }]
Nov 18 13:29:04.265: FAIL: failed to see Pod pod-test in namespace pods-5897 running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the graph metrics for this e2e test the test run that flaked looks to be due to fluctuations in the cluster environment. Three other tests also failed in the same run.
The current timeout is a lot more optimistic than the values set in wait.go and waiting a bit more will help the test deal with any fluctuations in the clusters state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interestingly, the place where this test is failing has been due to watch timeouts failing:
kubernetes/test/e2e/node/pods.go
Lines 307 to 317 in 5e44d8e
select { | |
case events, ok = <-ch: | |
if !ok { | |
continue | |
} | |
if len(events) < 2 { | |
framework.Fail("only got a single event") | |
} | |
case <-time.After(5 * time.Minute): | |
framework.Failf("timed out waiting for watch events for %s", pod.Name) | |
} |
And that has a timeout of 5 minutes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, realizing that this PR and #96565 are actually referring to two different tests, but the root cause seems likely similar.
/test pull-kubernetes-e2e-gce-100-performance
|
/test pull-kubernetes-node-crio-e2e |
the timeout change is fine for me, we can always come back and tighten if total execution time starts to grow. /approve |
/assign |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, hh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
We had a single flake in a couple weeks.
It seems we were able to reach PodScheduled, but just needed a bit more time for it to reach state running.