Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Topology Aware Hints #99522

Merged
merged 4 commits into from Mar 9, 2021

Conversation

robscott
Copy link
Member

@robscott robscott commented Feb 27, 2021

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This adds initial alpha support for Topology Aware Hints.

Does this PR introduce a user-facing change?

Topology Aware Hints are now available in alpha and can be enabled with the `TopologyAwareHints` feature gate.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/sig network
/priority important-soon

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/network Categorizes an issue or PR as relevant to SIG Network. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Feb 27, 2021
@robscott robscott changed the title Adding support for Topology Aware Hints WIP: Adding support for Topology Aware Hints Feb 27, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2021
@aojea
Copy link
Member

aojea commented Feb 27, 2021

/cc

@robscott robscott force-pushed the topology-hints branch 3 times, most recently from df43282 to c6e2ebe Compare March 1, 2021 20:18
@robscott robscott changed the title WIP: Adding support for Topology Aware Hints Adding support for Topology Aware Hints Mar 1, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 1, 2021
@robscott
Copy link
Member Author

robscott commented Mar 1, 2021

/retest

@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@robscott robscott force-pushed the topology-hints branch 3 times, most recently from 8445c39 to 584d277 Compare March 2, 2021 07:59
@robscott
Copy link
Member Author

robscott commented Mar 2, 2021

This may be a noisy PR, so removing reviewers that were auto-assigned, feel free to add yourself back if you're interested.
/uncc @mikedanese @MrHohn

@robscott
Copy link
Member Author

robscott commented Mar 2, 2021

I'm continuing to work on this PR, but I think we've reached a point where review would be valuable. I think the most significant and complex part of this PR is the controller logic. That is largely done now, although I need to significantly improve test coverage. Feedback on the structure and logic here would be very appreciated.

I still need to work on:

  • API strategy and validation for new fields
  • Filtering endpoints in kube-proxy based on these hints when they are present
  • Add metrics
  • Update kube-proxy to support multiple hints per endpoint.
  • Potentially adding a way to opt-in instead of feature gate enabling feature for all Services. Discussions ongoing as far as if this should integrate with traffic policy fields or be standalone.
  • Improved test coverage

/cc @andrewsykim @bowei @dcbw @wojtek-t
/assign @thockin

@robscott
Copy link
Member Author

robscott commented Mar 7, 2021

Now that EndpointSlice GA API and Controller PRs are in, I've rebased this PR one more time. Leaving the hold in place because I want to make sure Tim is OK with the annotation approach I've taken here. Happy to change it if not.

@robscott
Copy link
Member Author

robscott commented Mar 7, 2021

/retest

@robscott
Copy link
Member Author

robscott commented Mar 8, 2021

Today's updates:

  • A couple rebases
  • Refactoring + better testing for how endpoints are allocated to different zones
  • A new run of make update that resulted in some updates to staging/src/k8s.io/api/testdata/HEAD for EndpointSlice resources

givingZone, numToGive := getMost(givingZonesDesired)
receivingZone, numToReceive := getMost(receivingZonesDesired)

if (numToGive < 1.0 && numToReceive < 1.0) || numToGive < 0.0 || numToReceive < 0.0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this guarantee to break 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good question. I can't find any edge cases where it wouldn't, but I may have missed one. I've added some better test coverage that includes some unexpected/invalid inputs. I've also slightly expanded the conditions that would cause this to break out of the loop. Let me know if you can think of any edge cases I'm missing.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 9, 2021

@robscott: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-bazel-build 5cf0840b8af7d1703d1b6133de7c5111d86a8822 link /test pull-kubernetes-bazel-build
pull-kubernetes-bazel-test 5cf0840b8af7d1703d1b6133de7c5111d86a8822 link /test pull-kubernetes-bazel-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@robscott
Copy link
Member Author

robscott commented Mar 9, 2021

/test pull-kubernetes-integration
(was the TestWebhookTimeoutWithWatchCache flake I've seen several times)

/test pull-kubernetes-e2e-kind-ipv6
(Probing container should be ready immediately after startupProbe succeeds)

@robscott
Copy link
Member Author

robscott commented Mar 9, 2021

Removing hold now that @thockin has looked at annotation config. I think this is good to go now, PR still needs a LGTM if anyone is able to add that.

@aojea
Copy link
Member

aojea commented Mar 9, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2021
@robscott
Copy link
Member Author

robscott commented Mar 9, 2021

/hold cancel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/ipvs area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants