New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that Reason and Message are preserved on pod status #103785
Ensure that Reason and Message are preserved on pod status #103785
Conversation
The Kubelet always clears reason and message in generateAPIPodStatus even when the phase is unchanged. It is reasonable that we preserve the previous values when the phase does not change, and clear it when the phase does change. When a pod is evicted, this ensurse that the eviction message and reason are propagated even in the face of subsequent updates. It also preserves the message and reason if components beyond the Kubelet choose to set that value. To preserve the value we need to know the old phase, which requires a change to convertStatusToAPIStatus so that both methods have access to it.
Noticed while reviewing this code path. We can assume the temporary slice should be about the same size as it was previously.
The list of status conditions should be calculated all together, this made review more complex. Readability only.
/assign @Random-Liu Since we are not explicitly refreshing the pod status cache after #102344, the sequential status updates in syncTerminatedPod (once at beginning before volume detach, and once after in TerminatePod) result in clearing the message and reason during eviction. This preserves Reason/Message in the This also makes the Kubelet more tolerant to other controllers writing reason/message, although that's not a primary goal of this change. The third commit is a cleanup change but I can move that to the next release if it's concerning. |
Also note the first commit ensures that soft eviction can't cause the Kubelet to attempt to change a |
/test pull-node-kubelet-eviction |
@endocrimes: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubernetes-node-kubelet-eviction |
@smarterclayton: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Looks like it fixes the related eviction failures 👍 (remaining ones need unrelated investigation) |
Looking at some of the tests, my fixes may have corrected previous race bugs that might not have been visible:
Let me try to reproduce locally here to eliminate my changes as the cause (or conversely, identify whether this is now fixed). |
The fix makes sense to me. :) /lgtm |
/triage accepted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[sig-node] InodeEviction [Slow] [Serial] [Disruptive][NodeFeature:Eviction] when we run containers that should cause DiskPressure should eventually evict all of the correct pods
did not fail.
/lgtm
/milestone v1.22
oldPodStatus, found := kl.statusManager.GetPodStatus(pod.UID) | ||
if !found { | ||
oldPodStatus = pod.Status | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Lifting this up one call makes a ton of sense to me.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ehashman, Random-Liu, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/skip |
The Kubelet always clears reason and message in generateAPIPodStatus even when the phase is unchanged. It is reasonable that we preserve the previous values when the phase does not change, and clear it when the phase does change.
When a pod is evicted, this ensures that the eviction message and reason are propagated even in the face of subsequent updates. It also preserves the message and reason if components beyond the Kubelet choose to set that value. If reason/message are changed (due to preemption -> eviction) the most recent value is preserved.
To preserve the value we need to know the old phase, which requires a change to convertStatusToAPIStatus so that both methods have access to it.
/kind bug
Which issue(s) this PR fixes:
Fixes #103623
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: