Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not set sysctlRouteLocalnet (CVE-2020-8558) #92938

Merged
merged 1 commit into from Jan 12, 2021

Conversation

lbernail
Copy link
Contributor

@lbernail lbernail commented Jul 9, 2020

What type of PR is this?
/kind bug
/sig network
/area ipvs

What this PR does / why we need it:
Don't set sysctlRouteLocalnet when running in IPVS mode to avoid CVE-2020-8558
Accessing a Nodeport from the host network using localhost:Nodeport never worked in IPVS (see #67730)

Which issue(s) this PR fixes:
Fixes #92315 (for IPVS mode)

Does this PR introduce a user-facing change?:

[ACTION REQUIRED] kube-proxy's IPVS proxy mode no longer sets the net.ipv4.conf.all.route_localnet sysctl parameter. Nodes upgrading will have net.ipv4.conf.all.route_localnet set to 1 but new nodes will inherit the system default (usually 0). If you relied on any behavior requiring net.ipv4.conf.all.route_localnet, you must set ensure it is enabled as kube-proxy will no longer set it automatically. This change helps to further mitigate CVE-2020-8558.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/ipvs needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 9, 2020
@@ -353,7 +353,7 @@ func NewProxier(ipt utiliptables.Interface,
kernelHandler KernelHandler,
) (*Proxier, error) {
// Set the route_localnet sysctl we need for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update comment with context?

@andrewsykim
Copy link
Member

/test pull-kubernetes-e2e-gci-gce-ipvs

@andrewsykim
Copy link
Member

(some failures expected)

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2020
@thockin
Copy link
Member

thockin commented Jul 10, 2020

Isn't #92315 fixed by #91569 ?

@lbernail
Copy link
Contributor Author

Isn't #92315 fixed by #91569 ?
@thockin it is yes but the mitigation based on the kubelet patch will take a lot longer to rollout. A patch in kube-proxy is easier to roll out in most environments because you can just update the dameonset.
In addition, in the case of IPVS we don't need this sysctl (as far as I know) so there is no point in setting it

@lbernail
Copy link
Contributor Author

/retest

1 similar comment
@andrewsykim
Copy link
Member

/retest

@thockin
Copy link
Member

thockin commented Jul 10, 2020

I don't object to unsetting this in ipvs. I'd like to figure out how to properly fix it in iptables mode too. Reopened #90259 to discuss.

I can't locally repro the attack, so looking for other clever ideas

@lbernail
Copy link
Contributor Author

/test pull-kubernetes-e2e-gci-gce-ipvs

@andrewsykim
Copy link
Member

/retest

(some of the test failures are actually expected #92708)

@lbernail
Copy link
Contributor Author

/test pull-kubernetes-e2e-gci-gce-ipvs

@tabbysable
Copy link
Member

@thockin if you have root on the (linux) attacker machine, the following should suffice to reproduce:

ip addr add 127.0.0.2/8 dev lo
ip addr del 127.0.0.1/8 dev lo
ip route add 127.0.0.1/32 via YOUR-TARGET-HERE
sysctl net.ipv4.conf.all.route_localnet=1

With more packet gymnastics, it's also possible to reproduce using only CAP_NET_RAW; let me know on Slack or Twitter if you want to see that code before it's released.

@tabbysable
Copy link
Member

@lbernail I'm sure these things have been discussed, but it's probably good to bring them into the PR for future/public reference...

IMO, the right way to conceptualize CVE-2020-8558 is that it makes localhost-bound services unexpectedly available outside the host.

Back in RFC1122, they say that 127/8 addresses "MUST NOT appear outside a host" and more recently in 5735 that they "do not legitimately appear on any network anywhere." When route_localnet=1 is set, Linux goes out of compliance with RFC5735. We've all taken that RFC so deeply to heart that the consequences of violating it take us by surprise.

A key issue with this fix (and the existing fix from #91569) is that it assumes kube-proxy is the only entity with an interest in setting route_localnet=1. End-users may, for reasons they believe are valid, be setting that sysctl themselves.

Forcing the sysctl to zero in IPVS mode gives a great remediation experience for end users who do not need route_localnet=1, but it also prevents any user who is using route_localnet=1 on purpose from upgrading to the newest kube-proxy. Such users will have to fork kube-proxy or reimplement whatever clever thing they're doing.

There's the rub: We all have been breathing RFC5735 since we first learned networking. Whatever someone's doing with route_localnet=1 is probably filled with security holes they don't realize. If we can discourage that practice, we probably should, but that conflicts with the principle of letting users configure kubernetes to meet their own needs.

Adding a knob is painful when your system already has too many knobs, but I think it should be seriously considered. My suggestion, trying to balance all these concerns is as follows:

  • Add a commandline option --route-localnet-compatible
  • In IPVS mode
    • By default, force route_localnet=0 as you have proposed
    • With --route-localnet-compatible, leave the route_localnet state unchanged.
  • In IPTables mode
    • By default, do whatever mitigation is agreed upon. (I favor removing the localhost-nodePorts feature and setting route_localnet=0, but the block-non-loopback-127/8-traffic-via-IPTables mitigation is acceptable, too.)
    • With --route-localnet-compatible, do not apply mitigations, because they could impact whatever the end-user is attempting to do. We trust them to apply appropriate mitigations for their situation.
  • Deprecate --route-localnet-compatible in 1.20
  • Remove all appropriate code (depending on what's chosen WRT localhost-nodePorts in IPTables mode) when enough time has passed that CVE-2020-8558 is no longer a concern.

WDYT?

@lbernail
Copy link
Contributor Author

@tabbysable Thanks a lot for your input (for transparency, we work together and had this discussion on slack and figured it was worth having in public).

A few thoughts:

  • Yes this patch and kubelet: block non-forwarded packets from crossing the localhost boundary #91569 could break existing behaviors. We actually have a use case relying on this behavior but are working on changing it (use-case: shared service on the host that we don't want exposed outside of the node. It binds 127.0.0.1 and we redirect a specific IP to localhost with iptables so pods can access it)
  • I agree we shouldn't touch a sysctl that was set outside of Kubernetes but here we felt that this would allow for a very easy patch when kube-proxy is running in a daemonset
  • I also agree that localhost:nodeport should probably not be supported (but this will be hard to deprecate if users rely on it. We are lucky with IPVS because it never worked)

The flag approach is probably the best as it would allow for simple patching and users could disable it if needed. This is very similar to what we did with strictARP (see #75295). But of course, if we could avoid adding flags it would be better.

@andrewsykim We also discussed this on slack but we should probably discuss it here again. What do you think? (also do you know why the test is failing?)

@andrewsykim
Copy link
Member

/retest

(fixed failing test in kubernetes/test-infra#18339)

@andrewsykim
Copy link
Member

I agree with @tabbysable that a flag to toggle route_localnet makes sense, we may not have use-cases with IPVS but we do with iptables and such a flag should probably apply for both proxy modes anyways.

At this point though, even if we add a flag to skip setting route_localnet, the kubelet patch in #91569 already breaks this behavior regardless of what kube-proxy is setting. I think would have to undo the kubelet iptable rules first before we make these changes to kube-proxy?

@andrewsykim
Copy link
Member

andrewsykim commented Jul 17, 2020

Side note around sysctl flags: maybe it's time we consider a --sysctl-override flag that toggles all sysctl overrides from kube-proxy (default behavior is "true" which overrides sysctls like today, "false" means skip all). A flag per sysctl is probably not sustainable and an admin that wants to override any of the default sysctl is probably willing to configure all of them on their own. Documentation around all the sysctls we override would help as well.

@tabbysable
Copy link
Member

@andrewsykim Yes, I agree that the kubelet patch from #91569 should be taken out for the same reasons. It would be ideal if that and this could both happen together before 1.19 so that the behavior of 1.19 could be "CVE-2020-8558 is fixed in kube-proxy and can be switched on and off."

How to handle the kubelet patch in maintenance branches where it has already been released is another difficult question. :-(

A "don't touch my sysctls" flag sounds like a good, sustainable solution compared to separate ones per sysctl. The docs update is a very important part, so that users can make an informed decision.

@andrewsykim
Copy link
Member

@thockin any thoughts on the above?

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 11, 2021
Signed-off-by: Laurent Bernaille <laurent.bernaille@datadoghq.com>
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 11, 2021
@lbernail
Copy link
Contributor Author

As discussed with @andrewsykim and @tabbysable : let's not set the sysctl (because we don't need it in IPVS mode). Deploying kube-proxy won't fix the CVE but at least it will not introduce it on new nodes (note that if the version of the kubelet in recent enough, the CVE is mitigated)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lbernail

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 11, 2021
@andrewsykim
Copy link
Member

LGTM but release notes need an update

@lbernail lbernail changed the title Disable sysctlRouteLocalnet to address CVE-2020-8558 Do not set sysctlRouteLocalnet (CVE-2020-8558) Jan 11, 2021
@lbernail
Copy link
Contributor Author

LGTM but release notes need an update

Is the description clear enough? I could make it longer but release notes tend to be very short

@andrewsykim
Copy link
Member

andrewsykim commented Jan 11, 2021

I think this one probably warrants a longer one with an "ACTION REQUIRED" in it, maybe something like:

[ACTION REQUIRED] kube-proxy's IPVS proxy mode no longer enables the net.ipv4.conf.all.route_localnet sysctl parameter. Nodes upgrading will have net.ipv4.conf.all.route_localnet set to 1 but new nodes will inherit the system default (usually 0). If you relied on any behavior requiring net.ipv4.conf.all.route_localnet, you must set ensure it is enabled as kube-proxy will no longer set it automatically. This change helps to further mitigate CVE-2020-8558.

@k8s-ci-robot k8s-ci-robot added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jan 11, 2021
Copy link
Member

@andrewsykim andrewsykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 11, 2021
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipvs area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/release Categorizes an issue or PR as relevant to SIG Release. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CVE-2020-8558: Node setting allows for neighboring hosts to bypass localhost boundary