Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not attempt to overwrite higher system (sysctl) values #103174

Merged
merged 1 commit into from Sep 16, 2021

Conversation

Napsty
Copy link
Contributor

@Napsty Napsty commented Jun 25, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values.

When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in:

I0624 07:38:23.053960      54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999      54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.

Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com

Which issue(s) this PR fixes:

Fixes rancher/rancher#33360

Special notes for your reviewer:

The code change was mistakenly created as PR in the k3s project (see k3s-io/k3s#3505).
A real life use case is described in Rancher issue rancher/rancher#33360.

Does this PR introduce a user-facing change?

Changes behaviour of kube-proxy start; does not attempt to set specific sysctl values (which does not work in recent Kernel versions anymore in non-init namespaces), when the current sysctl values are already set higher.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2021
@k8s-ci-robot
Copy link
Contributor

@Napsty: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 25, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @Napsty!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @Napsty. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 25, 2021
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2021
@Napsty
Copy link
Contributor Author

Napsty commented Jul 23, 2021

ping @andrewsykim and @dcbw . Not really sure who else to ping. Let me know if someone else needs to do something first so this gets rolling. thx

@brandond
Copy link

brandond commented Jul 30, 2021

Might this be sig-node since it's kubelet sysctl stuff?

Worked around in downstream projects:

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 6, 2021
With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values.
The code change was mistakenly created as PR in the k3s project (see k3s-io/k3s#3505). 
A real life use case is described in Rancher issue rancher/rancher#33360.

When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in:

```
I0624 07:38:23.053960      54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999      54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
```

However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.

Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2021
@Napsty
Copy link
Contributor Author

Napsty commented Aug 25, 2021

ping @andrewsykim and @dcbw , who should be assigned to this?

@khenidak
Copy link
Contributor

khenidak commented Sep 3, 2021

/assign @khenidak

is it about running proxy with lower permission? or preventing the proxy from setting a value that might have impact outside lxc container?

@brandond
Copy link

brandond commented Sep 3, 2021

It's about the kernel no longer allowing these sysctls to be set within non-root namespaces.

@Napsty
Copy link
Contributor Author

Napsty commented Sep 3, 2021

@khenidak as @brandond said, but to be set within non "net init" namespaces to be more precise. Which is true for all started containers (LXC, Docker, ...).
The PR does not solve this a 100%, but allows a workaround that the server admin can set certain sysctl values high enough that kube-proxy accepts them (which is actually already the case, e.g. Ubuntu 20.04). In the current situation kube-proxy tries to set sysctl values to a certain pre-defined value - even if it is smaller than the current sysctl value.

@khenidak
Copy link
Contributor

@Napsty ACK. can you add release note?

@khenidak
Copy link
Contributor

/retest

@Napsty
Copy link
Contributor Author

Napsty commented Sep 15, 2021

@khenidak

I'm sorry, but what exactly is meant with release note? I read https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md but I still don't understand whether this involves an additional file or just a comment in the commit? Do you have an example at hand or point to another PR for comparison? Thank you!

@brandond
Copy link

brandond commented Sep 15, 2021

@Napsty there's a bit in the PR template where it says Does this PR introduce a user-facing change and you've responded with NONE. You should replace this with the actual user-facing change, as it will be worded in the release notes. You can look at pretty much any other PR for an example.

@Napsty
Copy link
Contributor Author

Napsty commented Sep 15, 2021

@brandond correct. I understood "user facing" as something requiring user input - which is not the case here. In fact, the PR is without any user interaction.

We could still mention the different behavior when Kernel sys values are already set higher than the kube-proxy expected value. But I fail to understand where this needs to be done. I read the contributors release notes twice now and I'm still none the wiser ;-)

A bit of help/guidance for a first timer please :-)

@brandond
Copy link

brandond commented Sep 15, 2021

It doesn't need to be something that the user has to take action on, just something that they should know about when upgrading. Think of it from a user or administrator's perspective - what would you like to know about this change? Would you like to know that you will no longer have to manually set sysctls before starting kube-proxy?

Here's an example of a PR with an information changelog entry:
#104997

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Sep 16, 2021
@khenidak
Copy link
Contributor

khenidak commented Sep 16, 2021

/retest
/lgtm
/approve

The user facing section has been filled. I think we are good to go. Thanks @Napsty for this.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 16, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khenidak, Napsty

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 16, 2021
@k8s-ci-robot k8s-ci-robot merged commit 16823fc into kubernetes:master Sep 16, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Sep 16, 2021
@Napsty Napsty deleted the rancher-33360 branch September 17, 2021 05:09
jrosser pushed a commit to jrosser/ansible-collection-kubernetes that referenced this pull request Sep 5, 2023
In a containerised environment (docker/LXC/LXD...) with a non
net-init namespace it is not possible to adjust conntrack settings.
kube-proxy attempts to do this and fails to start unless it is
configured not to do so.

The previous logic which detected the use of a docker type ansible
connection plugin is converted to an overridable variable to allow
a deployment tool to specify that conntrack should not be adjusted.

See kubernetes/kubernetes#103174
@@ -96,7 +96,7 @@ func (realConntracker) setIntSysCtl(name string, value int) error {
entry := "net/netfilter/" + name

sys := sysctl.New()
if val, _ := sys.GetSysctl(entry); val != value {
if val, _ := sys.GetSysctl(entry); val != value && val < value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have doubts this is correct

However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.

@aojea
Copy link
Member

aojea commented Sep 5, 2023

However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.

I do not agree with this PR, kube-proxy tries to set a sysctl and the system is read only hence fails, this is a workaround based on the fact that the variable set is higher than the configured one, that does not apply to all sysctl variables, that may be boolean or have different values like rp_filter

@aroradaman
Copy link
Member

@khenidak as @brandond said, but to be set within non "net init" namespaces to be more precise. Which is true for all started containers (LXC, Docker, ...). The PR does not solve this a 100%, but allows a workaround that the server admin can set certain sysctl values high enough that kube-proxy accepts them (which is actually already the case, e.g. Ubuntu 20.04). In the current situation kube-proxy tries to set sysctl values to a certain pre-defined value - even if it is smaller than the current sysctl value.

#103174 (comment)

I guess simply configuring KubeProxyConntrackConfiguration to exactly match the values that the server admin has configured for the host would have solved the issue without any code change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rancher 2.5 (Single Install) not starting after nf_conntrack_max value adjustment
6 participants