Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #108366 (release-1.23): Delay writing a terminal phase until the pod is terminated #108723

Conversation

bobbypage
Copy link
Member

@bobbypage bobbypage commented Mar 16, 2022

Cherry pick of #108366 on release-1.23.

For details on the cherry pick process, see the cherry pick requests page.

Fixed a 1.22 regression that could incorrectly reject pods with OutOfCpu errors if they were rapidly scheduled after other pods were reported as complete in the API. The Kubelet now waits to report the phase of a pod as terminal in the API until all running containers are guaranteed to have stopped and no new containers can be started.  Short-lived pods may take slightly longer (~1s) to report Succeeded or Failed after this change.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Mar 16, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Mar 16, 2022
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet labels Mar 16, 2022
@k8s-ci-robot k8s-ci-robot added area/test sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 16, 2022
@bobbypage
Copy link
Member Author

Doing a early cherrypick of #108366 on release-1.23 so we can get CI signal on release branch.

@bobbypage bobbypage changed the title Automated cherry pick of #108366 upstream release 1.23 wip: Automated cherry pick of #108366 upstream release 1.23 Mar 16, 2022
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 16, 2022
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to PRs Waiting on Author in SIG Node CI/Test Board Mar 16, 2022
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Waiting on Author in SIG Node PR Triage Mar 16, 2022
@liggitt
Copy link
Member

liggitt commented Mar 16, 2022

/retest

was the cherry-pick onto 1.23 clean?

smarterclayton and others added 4 commits March 16, 2022 12:07
Other components must know when the Kubelet has released critical
resources for terminal pods. Do not set the phase in the apiserver
to terminal until all containers are stopped and cannot restart.

As a consequence of this change, the Kubelet must explicitly transition
a terminal pod to the terminating state in the pod worker which is
handled by returning a new isTerminal boolean from syncPod.

Finally, if a pod with init containers hasn't been initialized yet,
don't default container statuses or not yet attempted init containers
to the unknown failure state.
Exploring termination revealed we have race conditions in certain
parts of pod initialization and termination. To better catch these
issues refactor the existing test so it can be reused, and then test
a number of alternate scenarios.
Create an E2E test that creates a job that spawns a pod that should
succeed. The job reserves a fixed amount of CPU and has a large number
of completions and parallelism. Use to repro github.com/kubernetes/issues/106884

Signed-off-by: David Porter <david@porter.me>
Signed-off-by: David Porter <david@porter.me>
@bobbypage bobbypage force-pushed the automated-cherry-pick-of-#108366-upstream-release-1.23 branch from fb1c768 to 7f02733 Compare March 16, 2022 19:07
@liggitt
Copy link
Member

liggitt commented Mar 17, 2022

/lgtm
/approve
/hold for master periodic serial job signal

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 17, 2022
@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 17, 2022
@bobbypage
Copy link
Member Author

Removing WIP since master PR has merged.

@bobbypage bobbypage changed the title wip: Automated cherry pick of #108366 (release-1.23): Delay writing a terminal phase until the pod is terminated Automated cherry pick of #108366 (release-1.23): Delay writing a terminal phase until the pod is terminated Mar 17, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2022
@liggitt
Copy link
Member

liggitt commented Mar 18, 2022

/hold cancel

multiple green runs of post-submit serial test jobs on master with this fix

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2022
@saschagrunert saschagrunert added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 18, 2022
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Mar 18, 2022
SIG Node CI/Test Board automation moved this from PRs Waiting on Author to PRs - Needs Approver Mar 18, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobbypage, liggitt, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rphillips
Copy link
Member

/test pull-kubernetes-dependencies

@k8s-ci-robot k8s-ci-robot merged commit 302af17 into kubernetes:release-1.23 Mar 18, 2022
SIG Node CI/Test Board automation moved this from PRs - Needs Approver to Done Mar 18, 2022
SIG Node PR Triage automation moved this from Waiting on Author to Done Mar 18, 2022
@liggitt liggitt removed the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

None yet

7 participants