Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cgroup handling for systemd with cgroup v2 #98365

Merged
merged 1 commit into from Mar 4, 2021

Conversation

odinuge
Copy link
Member

@odinuge odinuge commented Jan 25, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

This fixes issues where kubelet enforces qos and nodeAllocatable on the
worng hierarchy. Kublet will now create the files

/sys/fs/cgroup/kubepods/{burstable,besteffort,}/pod-xyz

when running with systemd as the driver, making it impossible to enforce
the limits on nodeAllocatable.

Do we have any e2e-tests running with systemd/and-or cgroup v2? If not, we should probably have that.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fix cgroup handling for systemd with cgroup v2

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 25, 2021
@odinuge
Copy link
Member Author

odinuge commented Jan 25, 2021

/sig node
/assign @giuseppe
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/kubelet and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 25, 2021
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Do we have any e2e-tests running with systemd/and-or cgroup v2? If not, we should probably have that.

AFAIK e2e tests were a requirement when we added the v2 support.

i found one cri-o based test here that is not running:
https://k8s-testgrid.appspot.com/sig-node-cri-o#pr-crio-cgrpv2-gce-e2e

@odinuge
Copy link
Member Author

odinuge commented Jan 25, 2021

AFAIK e2e tests were a requirement when we added the v2 support.

Yeah, that was what I thought as well, but couldn't find much tbh. The cgroup hanling is quite obscure and hard to reason around with a lot of duplicate work in k8s/k8s and opencontainers/runc/libcontainer, and some is also handled by CRI. Hopefully we can get it battle tested with some solid e2e tests to avoid regressions.

I am testing some cgroup v2 stuff now, and there are quite a few rough edges still (including this, and opencontainers/runc#2474 (comment)), so it is definitely not stable enough to run in production imo. I will continue to work on it tho, and I guess @giuseppe is working a bit on it as well.

/retest

@odinuge
Copy link
Member Author

odinuge commented Jan 25, 2021

/test pull-kubernetes-node-crio-cgrpv2-e2e

@odinuge
Copy link
Member Author

odinuge commented Jan 25, 2021

Hmm. The failure on pull-kubernetes-node-crio-cgrpv2-e2e is the same as it is in #83541 (just tested in on a another random PR). Not sure what the errors are becuase of tho.

We should run these as periodic tasks, and not just presubmit. Also, this the https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial tests are more relevant for cgroup handling, but that testgrid tab looks quite red already 😅

@ehashman ehashman added this to Needs Reviewer in SIG Node PR Triage Jan 25, 2021
@ehashman
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 25, 2021
@SergeyKanzhelev SergeyKanzhelev moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Feb 4, 2021
@odinuge
Copy link
Member Author

odinuge commented Feb 10, 2021

Ping @giuseppe

@giuseppe
Copy link
Member

Ping @giuseppe

we run e2e tests in the CRI-O CI. Could you open a PR for CRI-O and set the remote for Kubernetes to your branch so we can see what happens?

This fixes issues where kubelet enforces qos and nodeAllocatable on the
worng hierarchy. Kublet will now create the files

/sys/fs/cgroup/kubepods/{burstable,besteffort,}/pod-xyz

when running with systemd as the driver, making it impossible to enforce
the limits on nodeAllocatable.
@odinuge
Copy link
Member Author

odinuge commented Feb 12, 2021

Hi @giuseppe ,

we run e2e tests in the CRI-O CI.

Yeah, that makes sense, but I don't think a CRI implementation have tests to verify node allocatable or pod limits, but I might be incorrect. If you do have tests running catching this, it would be nice to get them into the testgrid.

Could you open a PR for CRI-O and set the remote for Kubernetes to your branch so we can see what happens?

Can you clarify what you mean by "set the remote for kubernetes"? This file is not used as a go dep, so not sure what you want me to do...

giuseppe added a commit to giuseppe/cri-o that referenced this pull request Feb 12, 2021
test: kubernetes/kubernetes#98365

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

Hi @giuseppe ,

we run e2e tests in the CRI-O CI.

Yeah, that makes sense, but I don't think a CRI implementation have tests to verify node allocatable or pod limits, but I might be incorrect. If you do have tests running catching this, it would be nice to get them into the testgrid.

Could you open a PR for CRI-O and set the remote for Kubernetes to your branch so we can see what happens?

Can you clarify what you mean by "set the remote for kubernetes"? This file is not used as a go dep, so not sure what you want me to do...

I've opened a test PR cri-o/cri-o#4569

The only reason for opening it is to make sure there are no regressions in the e2e when running on cgroup v2

I think @harche is already working on adding these tests to kubernetes

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests pass

LGTM

@harche
Copy link
Contributor

harche commented Feb 12, 2021

/test pull-kubernetes-node-crio-cgrpv2-e2e

@harche
Copy link
Contributor

harche commented Feb 12, 2021

We should run these as periodic tasks, and not just presubmit. Also, this the https://testgrid.k8s.io/sig-node-kubelet#node-kubelet-serial tests are more relevant for cgroup handling, but that testgrid tab looks quite red already

I wanted to fix the known failures in pull-kubernetes-node-crio-cgrpv2-e2e before making it periodic.

@odinuge
Copy link
Member Author

odinuge commented Feb 12, 2021

@giuseppe: I've opened a test PR cri-o/cri-o#4569

I see. That make sense, thanks! 😄

@giuseppe: I think @harche is already working on adding these tests to kubernetes

Ahh, nice!

@harche: I wanted to fix the known failures in pull-kubernetes-node-crio-cgrpv2-e2e before making it periodic.

Ahh, that makes sense. Do you have an issue were that is tracked? I didn't find one, so thought the errors were new (and therefore I guess I wasted some time :/)... Is fixing those errors being tracked as well, and is it being worked on?

@harche
Copy link
Contributor

harche commented Feb 12, 2021

@odinuge I am working on fixing those 2 failures in pull-kubernetes-node-crio-cgrpv2-e2e, right now there is no issue to track the errors.

@odinuge
Copy link
Member Author

odinuge commented Feb 17, 2021

/assign @sjenning

@harche
Copy link
Contributor

harche commented Feb 19, 2021

@giuseppe: I've opened a test PR cri-o/cri-o#4569

I see. That make sense, thanks!

@giuseppe: I think @harche is already working on adding these tests to kubernetes

Ahh, nice!

@harche: I wanted to fix the known failures in pull-kubernetes-node-crio-cgrpv2-e2e before making it periodic.

Ahh, that makes sense. Do you have an issue were that is tracked? I didn't find one, so thought the errors were new (and therefore I guess I wasted some time :/)... Is fixing those errors being tracked as well, and is it being worked on?

@odinuge I created an issue to track the failures we are seeing with NodeConformance and NodeFeatures on cgroup v2, #99230

@ehashman ehashman moved this from Waiting on Author to Needs Reviewer in SIG Node PR Triage Feb 19, 2021
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ehashman ehashman moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Feb 19, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 19, 2021
@mrunalp
Copy link
Contributor

mrunalp commented Mar 4, 2021

/approve

@mrunalp mrunalp moved this from Needs Approver to Done in SIG Node PR Triage Mar 4, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, mrunalp, odinuge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2021
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot
Copy link
Contributor

@odinuge: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-node-crio-cgrpv2-e2e 124de52 link /test pull-kubernetes-node-crio-cgrpv2-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 413ff67 into kubernetes:master Mar 4, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

9 participants