New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete static pod gracefully and fix mirrorPodTerminationMap leak #98103
Delete static pod gracefully and fix mirrorPodTerminationMap leak #98103
Conversation
@gjkim42: GitHub didn't allow me to request PR reviews from the following users: harche. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gjkim42: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @gjkim42. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
if _, ok := kl.podManager.GetMirrorPodByPod(pod); ok { | ||
kl.podKiller.MarkMirrorPodPendingTermination(pod) | ||
} | ||
kl.podKiller.KillPod(&podPair) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewers: ensure podKiller.KillPod to be called
/ok-to-test |
Nice! We will give this a test today or Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this LGTM...
/assign @harche
@@ -2103,9 +2110,6 @@ func (kl *Kubelet) HandlePodRemoves(pods []*v1.Pod) { | |||
kl.handleMirrorPod(pod, start) | |||
continue | |||
} | |||
if _, ok := kl.podManager.GetMirrorPodByPod(pod); ok { | |||
kl.podKiller.MarkMirrorPodPendingTermination(pod) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I think I see what's happening here. This code never even gets executed because we'll hit the continue
above before ever executing this.
My next question was "should this then go in handleMirrorPod
instead?" but that code looks like it's in need of a refactor since 2015... #17251
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I am pointing out is that this code was executed but the kl.deletePod below may return an error before executing podKiller.KillPod. And then we don't have a chance to delete the pending state from the podKiller.mirrorPodTerminationMap(a leak problem). What I fixed is to make sure it calls the podKiller.KillPod.
kubernetes/pkg/kubelet/kubelet.go
Lines 1738 to 1772 in b6958b2
func (kl *Kubelet) deletePod(pod *v1.Pod) error { | |
if pod == nil { | |
return fmt.Errorf("deletePod does not allow nil pod") | |
} | |
if !kl.sourcesReady.AllReady() { | |
// If the sources aren't ready, skip deletion, as we may accidentally delete pods | |
// for sources that haven't reported yet. | |
return fmt.Errorf("skipping delete because sources aren't ready yet") | |
} | |
kl.podWorkers.ForgetWorker(pod.UID) | |
// make sure our runtimeCache is at least as fresh as the last container started event we observed. | |
// this ensures we correctly send graceful deletion signals to all containers we've reported started. | |
if lastContainerStarted, ok := kl.lastContainerStartedTime.Get(pod.UID); ok { | |
if err := kl.runtimeCache.ForceUpdateIfOlder(lastContainerStarted); err != nil { | |
return fmt.Errorf("error updating containers: %v", err) | |
} | |
} | |
// Runtime cache may not have been updated to with the pod, but it's okay | |
// because the periodic cleanup routine will attempt to delete again later. | |
runningPods, err := kl.runtimeCache.GetPods() | |
if err != nil { | |
return fmt.Errorf("error listing containers: %v", err) | |
} | |
runningPod := kubecontainer.Pods(runningPods).FindPod("", pod.UID) | |
if runningPod.IsEmpty() { | |
return fmt.Errorf("pod not found") | |
} | |
podPair := kubecontainer.PodPair{APIPod: pod, RunningPod: &runningPod} | |
kl.podKiller.KillPod(&podPair) | |
// We leave the volume/directory cleanup to the periodic cleanup routine. | |
return nil | |
} |
@ehashman: GitHub didn't allow me to assign the following users: harche. Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I verified this PR using the pods mentioned in #97722 and it seems to be fixing the issue. I do not see |
/lgtm |
751cf2a
to
1563fb6
Compare
/retest |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gjkim42, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-e2e-gce-100-performance debian iptables flake |
/cherry-pick release-1.19 |
/cherry-pick release-1.20 |
doh! I forgot the bot is not setup for cherry picks. |
@rphillips should this also be pulled into 1.18? is it affected? |
@rphillips echoing @ehashman's question here. I believe it is applicable to 1.18, but please correct me if I am incorrect. |
I will TAL |
…103-upstream-release-1.19 Automated cherry pick of #98103: kubelet: Delete static pods gracefully
…103-upstream-release-1.20 Automated cherry pick of #98103: kubelet: Delete static pods gracefully
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #97722
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/cc @ehashman @rphillips @harche