Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kube-proxy check if IPv6 is really supported before assuming dual-stack #99127

Merged
merged 1 commit into from Feb 18, 2021

Conversation

danwinship
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Now that the dual stack feature gate is enabled by default, kube-proxy defaults to trying to run dual stack (regardless of the cluster configuration), but some nodes have no IPv6 support, resulting in lots of spurious errors.

Which issue(s) this PR fixes:

Fixes #99031

Special notes for your reviewer:

@aojea I wrote this before noticing #99066... but this version is much smaller and more back-port-able. Maybe we want to merge pieces of both PRs together

Does this PR introduce a user-facing change?

-->

Fixes spurious errors about IPv6 in kube-proxy logs on nodes with IPv6 disabled.

/priority important-soon
/sig network

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 16, 2021
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 16, 2021
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 16, 2021
@aojea
Copy link
Member

aojea commented Feb 16, 2021

/test pull-kubernetes-gci-gce-ipvs

@k8s-ci-robot
Copy link
Contributor

@aojea: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test pull-kubernetes-bazel-build
  • /test pull-kubernetes-bazel-test
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-canary
  • /test pull-kubernetes-e2e-ipvs-azure-dualstack
  • /test pull-kubernetes-e2e-iptables-azure-dualstack
  • /test pull-kubernetes-e2e-aws-eks-1-13-correctness
  • /test pull-kubernetes-files-remake
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-no-stage
  • /test pull-kubernetes-e2e-gce-kubetest2
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-ubuntu
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd-canary
  • /test pull-kubernetes-e2e-gce-rbe
  • /test pull-kubernetes-e2e-gce-alpha-features
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-e2e-kops-aws
  • /test pull-kubernetes-bazel-build-canary
  • /test pull-kubernetes-bazel-test-canary
  • /test pull-kubernetes-bazel-test-integration-canary
  • /test pull-kubernetes-local-e2e
  • /test pull-publishing-bot-validate
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-aks-engine-azure
  • /test pull-kubernetes-e2e-azure-disk
  • /test pull-kubernetes-e2e-azure-disk-vmss
  • /test pull-kubernetes-e2e-azure-file
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-ipvs-dual-canary
  • /test pull-kubernetes-e2e-ubuntu-gce-network-policies
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-node-e2e
  • /test pull-kubernetes-node-e2e-podutil
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-node-e2e-alpha
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-big-performance
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-large-performance
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gce-storage-slow-rbe
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-iscsi
  • /test pull-kubernetes-e2e-gce-iscsi-serial
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-aks-engine-azure-windows
  • /test pull-kubernetes-e2e-aks-engine-windows-contianerd
  • /test pull-kubernetes-e2e-azure-disk-windows
  • /test pull-kubernetes-e2e-azure-file-windows
  • /test pull-kubernetes-e2e-aks-engine-windows-gpu
  • /test pull-kubernetes-e2e-azure-disk-windows-containerd
  • /test pull-kubernetes-e2e-azure-file-windows-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-verify-govet-levee
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-e2e-windows-gce

Use /test all to run the following jobs:

  • pull-kubernetes-bazel-build
  • pull-kubernetes-bazel-test
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-gce-ubuntu-containerd
  • pull-kubernetes-integration
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-node-e2e
  • pull-kubernetes-e2e-gce-100-performance
  • pull-kubernetes-typecheck
  • pull-kubernetes-verify-govet-levee
  • pull-kubernetes-verify

In response to this:

/test pull-kubernetes-gci-gce-ipvs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aojea
Copy link
Member

aojea commented Feb 16, 2021

/test pull-kubernetes-e2e-gci-gce-ipvs
/test pull-kubernetes-e2e-kind-ipvs-dual-canary

@aojea
Copy link
Member

aojea commented Feb 16, 2021

@aojea I wrote this before noticing #99066... but this version is much smaller and more back-port-able. Maybe we want to merge pieces of both PRs together

This is much better 👍 ... the service IP family detection from the other PR can be done as a follow up, and we can get rid of the bindAddress hack

}
} else {
ipt[0] = utiliptables.New(execer, utiliptables.ProtocolIPv4)
ipt[1] = iptInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we'll see the day that this fail because IPv4 is disabled in the host? 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt any of this code will still be around then. The kernel does not currently support disabling IPv4.

@ehashman ehashman added this to Triage in SIG Node PR Triage Feb 17, 2021
@aojea
Copy link
Member

aojea commented Feb 17, 2021

/test pull-kubernetes-e2e-gci-gce-ipvs
this is the one reporting the error, because it runs in a different cluster with the nodes that blacklist the kernel module for nat ipv6

@dcbw
Copy link
Member

dcbw commented Feb 17, 2021

I presume there's no way to do unit tests for this?

@aojea
Copy link
Member

aojea commented Feb 17, 2021

@danwinship it works 👏

I0217 14:53:18.118226 1 server_others.go:174] DetectLocalMode: 'NodeCIDR'
I0217 14:53:18.151595 1 iptables.go:624] DEBUG: ChainExists %#v output
%s
%v[POSTROUTING -t nat]ip6tables v1.8.5 (legacy): can't initialize ip6tables table `nat': Table does not exist (do you need to insmod?)
Perhaps ip6tables or your kernel needs to be upgraded.
exit status 3
W0217 14:53:18.151639 1 server_others.go:194] No iptables support for IPv6: exit status 3
I0217 14:53:18.151659 1 server_others.go:205] kube-proxy running in single-stack IPv4 mode
I0217 14:53:18.151672 1 server_others.go:271] Using ipvs Proxier.

@danwinship
Copy link
Contributor Author

W0217 14:53:18.151639 1 server_others.go:194] No iptables support for IPv6: exit status 3

great error reporting there but that's a general utiliptables problem.

repushed without the debug commit. Should be ready to go now

@danwinship
Copy link
Contributor Author

I presume there's no way to do unit tests for this?

It's possible... we'd have to refactor kube-proxy startup to make this subpart of it independently testable... but it would be kind of vacuous. "If we hack it so iptables returns exactly the specific error we're testing for, then kube-proxy will fall back to single-stack". It doesn't really prove that the code actually does the right thing in any real environment.

@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Needs Reviewer in SIG Node PR Triage Feb 17, 2021
@SergeyKanzhelev
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 17, 2021
@aojea
Copy link
Member

aojea commented Feb 17, 2021

/lgtm
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 17, 2021
@SergeyKanzhelev SergeyKanzhelev moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Feb 17, 2021
@aojea
Copy link
Member

aojea commented Feb 17, 2021

/retest
unrelated sig-storage tests

@aojea
Copy link
Member

aojea commented Feb 18, 2021

ping @dcbw for final approval

@dcbw
Copy link
Member

dcbw commented Feb 18, 2021

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, dcbw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 18, 2021
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Feb 18, 2021

@danwinship: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kind-ipvs-dual-canary 1c9c23303c644c06a787f90da91c4e09d30ddb81 link /test pull-kubernetes-e2e-kind-ipvs-dual-canary

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dcbw
Copy link
Member

dcbw commented Feb 18, 2021

CI system issues...

Init container initupload not ready: (state: terminated, reason: "Error", message: ".com/token\\\": dial tcp: i/o timeout writer close error: Post \\\"https://storage.googleapis.com/upload/storage/v1/b/kubernetes-jenkins/o?alt=json\\u0026name=pr-logs%2Fpull%2F99127%2Fpull-kubernetes-bazel-build%2F1362409247526096896%2Fclone-log.txt\\u0026prettyPrint=false\\u0026projection=full\\u0026uploadType=multipart\\\": oauth2: cannot fetch token: Post \\\"https://oauth2.googleapis.com/token\\\": dial tcp: i/o timeout writer close error: Post \\\"https://storage.googleapis.com/upload/storage/v1/b/kubernetes-jenkins/o?alt=json\\u0026name=pr-logs%2Fpull%2F99127%2Fpull-kubernetes-bazel-build%2F1362409247526096896%2Fstarted.json\\u0026prettyPrint=false\\u0026projection=full\\u0026uploadType=multipart\\\": oauth2: cannot fetch token: Post \\\"https://oauth2.googleapis.com/token\\\": dial tcp: i/o timeout writer close error: Post \\\"https://storage.googleapis.com/upload/storage/v1/b/kubernetes-jenkins/o?alt=json\\u0026name=pr-logs%2Fdirectory%2Fpull-kubernetes-bazel-build%2F1362409247526096896.txt\\u0026prettyPrint=false\\u0026projection=full\\u0026uploadType=multipart\\\": oauth2: cannot fetch token: Post \\\"https://oauth2.googleapis.com/token\\\": dial tcp: i/o timeout writer close error: Post \\\"https://storage.googleapis.com/upload/storage/v1/b/kubernetes-jenkins/o?alt=json\\u0026name=pr-logs%2Fpull%2F99127%2Fpull-kubernetes-bazel-build%2F1362409247526096896%2Ffinished.json\\u0026prettyPrint=false\\u0026projection=full\\u0026uploadType=multipart\\\": oauth2: cannot fetch token: Post \\\"https://oauth2.googleapis.com/token\\\": dial tcp: i/o timeout writer close error: Post \\\"https://storage.googleapis.com/upload/storage/v1/b/kubernetes-jenkins/o?alt=json\\u0026name=pr-logs%2Fdirectory%2Fpull-kubernetes-bazel-build%2Flatest-build.txt\\u0026prettyPrint=false\\u0026projection=full\\u0026uploadType=multipart\\\": oauth2: cannot fetch token: Post \\\"https://oauth2.googleapis.com/token\\\": dial tcp: i/o timeout]\",\"file\":\"prow/cmd/initupload/main.go:45\",\"func\":\"main.main\",\"level\":\"fatal\",\"msg\":\"Failed to initialize job\",\"severity\":\"fatal\",\"time\":\"2021-02-18T14:47:26Z\"}\n") Init container place-entrypoint not ready: (state: waiting, reason: "PodInitializing", message: "")

/retest

@k8s-ci-robot k8s-ci-robot merged commit 9fb1aa9 into kubernetes:master Feb 18, 2021
SIG Node PR Triage automation moved this from Needs Approver to Done Feb 18, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Feb 18, 2021
@danwinship danwinship deleted the non-dual-stack-proxy branch November 19, 2021 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

weird error message for ip6tables-save / restore in ipvs kube proxy logs
5 participants