Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: Handle UID reuse in pod worker #104847

Merged
merged 2 commits into from Sep 16, 2021

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Sep 8, 2021

If a pod is killed (no longer wanted) and then a subsequent create/add/update event is seen in the pod worker, assume that a pod UID was reused (as it could be in static pods) and have the next SyncKnownPods after the pod terminates remove the worker history so that the config loop can restart the static pod, as well as return to the caller the fact that this termination was not final.

  1. If the pod worker sees updates KILL -> CREATE it sets a flag on the worker status restartRequested = true
  2. When SyncKnownPods runs it reports any terminated worker with that flag as TemporarilyTerminatedWork
  3. The pod housekeeping loop (which reconciles the pod worker with the config state) checks to see which pods were terminated but may need restart (TemporarilyTerminatedWork) during the sync, and starts them if they are not terminal (aka desired and admitted)

A pod that restarts this way will wait at most housekeeping loop period (2s) between being terminated and starting again.

/kind bug
/sig node

Fixes #104648

TODO:

  • verifying this fixes the race
Fix a 1.22 regression when a static pod file is deleted and recreated while using a fixed UID, the pod was not properly restarted.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 8, 2021
@k8s-ci-robot k8s-ci-robot added area/kubelet approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 8, 2021
@smarterclayton smarterclayton changed the title kubelet: Handle UID reuse in pod worker WIP: kubelet: Handle UID reuse in pod worker Sep 8, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 8, 2021
@smarterclayton
Copy link
Contributor Author

smarterclayton commented Sep 8, 2021

In theory this fixes the problem but you have to wait for a reconcile loop of ~90s. A slightly more complex impl might queue the next pod. I'll take a look at that.

Ideally SyncKnownPods would be able to restart workers, but SyncKnownPods doesn't get passed "the pods the pod worker should know about" today. It needs to be "the admitted pods that should be running" which is a subset of what is in the pod manager.

@ehashman ehashman added this to Waiting on Author in SIG Node PR Triage Sep 9, 2021
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 9, 2021
@smarterclayton
Copy link
Contributor Author

Ok, after thinking through this some more I'm fairly convinced this is safe. Builds on top of #104817 (which renames a bit of the admission logic):

  1. If the pod worker sees updates KILL -> CREATE it sets a flag on the worker status restartRequested = true
  2. When SyncKnownPods runs it reports any terminated worker with that flag as TemporarilyTerminatedWork
  3. The pod housekeeping loop (which reconciles the pod worker with the config state) checks to see which pods were terminated but may need restart during the sync, and starts them if they are not terminal (aka desired and admitted)

A pod that restarts this way will wait at most housekeeping loop period (2s) between being terminated and starting again.

@249043822
Copy link
Member

/priority critical-urgent
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 10, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 16, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@derekwaynecarr
Copy link
Member

/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial

@rphillips
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 16, 2021
@k8s-ci-robot
Copy link
Contributor

@smarterclayton: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
pull-kubernetes-node-kubelet-serial-containerd d571980 link /test pull-kubernetes-node-kubelet-serial-containerd
pull-kubernetes-node-kubelet-serial d571980 link /test pull-kubernetes-node-kubelet-serial
pull-kubernetes-integration d571980 link /test pull-kubernetes-integration

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rphillips
Copy link
Member

TestCustomResourceCascadingDeletion flake in pull-kubernetes-integration

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot merged commit 51384aa into kubernetes:master Sep 16, 2021
SIG Node CI/Test Board automation moved this from Archive-it to Done Sep 16, 2021
SIG Node PR Triage automation moved this from Waiting on Author to Done Sep 16, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Sep 16, 2021
k8s-ci-robot added a commit that referenced this pull request Sep 17, 2021
…04847-upstream-release-1.22

Automated cherry pick of #104847: kubelet: Handle UID reuse in pod worker
ehashman pushed a commit to ehashman/kubernetes that referenced this pull request Nov 24, 2021
A pod that has been rejected by admission will have status manager
set the phase to Failed locally, which make take some time to
propagate to the apiserver. The rejected pod will be included in
admission until the apiserver propagates the change back, which
was an unintended regression when checking pod worker state as
authoritative.

A pod that is terminal in the API may still be consuming resources
on the system, so it should still be included in admission.

[ehashman] Rebased on top of kubernetes#104847.
@liggitt liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

1.22 regression: removing and recreating static pod manifest leaves pod in error state
10 participants