Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and de-layer Service REST implementation #96684

Merged
merged 79 commits into from Sep 11, 2021

Conversation

thockin
Copy link
Member

@thockin thockin commented Nov 18, 2020

Simplify Service's REST implementation by using the Begin* hooks introduced in #96393. This is a large change overall, so the PR is a series of commits that are each designed to be easy to read and review, even if they are incomplete on their own.

Background:

Service has an "outer" and "inner" REST handler. This is because of how we do IP and port allocations synchronously, but since we don't have API transactions, we need to roll those back in case of a failure. Both layers use the same Strategy, but the outer calls into the inner, which causes a lot of complexity in the code (including an open-coded partial reimplementation of a date-unknown snapshot of the generic REST code) and results in Prepare and Validate hooks being called twice.

This PR aims to merge those layers and use mostly generic REST code. This DOES NOT attempt to change or fix the way IPs and ports are allocated in the Service REST stack.

One net result of this is that the validation code can get a bit stricter, since it will NEVER see a service before allocation. This is left for later.

For posterity:

The "normal" REST flow seems to be:

mutating webhooks
generic REST store Create {
    cleanup = BeginCreate
    BeforeCreate {
        strategy.PrepareForCreate {
            dropDisabledFields
        }
        strategy.Validate
        strategy.Canonicalize
    }
    createValidation (validating webhooks)
    storage Create
    cleanup
    AfterCreate
    Decorator
}

Service (before this PR) does:

mutating webhooks
svc custom Create {
    BeforeCreate {
        strategy.PrepareForCreate {
            dropDisabledFields
        }
        strategy.Validate
        strategy.Canonicalize
    }
    Allocations
    inner (generic) Create {
        cleanup = BeginCreate
        BeforeCreate {
            strategy.PrepareForCreate {
                dropDisabledFields
            }
            strategy.Validate
            strategy.Canonicalize
        }
        createValidation (validating webhooks)
        storage Create
        cleanup
        AfterCreate
        Decorator
    }
}

After this PR:

mutating webhooks
generic REST store Create {
    cleanup = BeginCreate
        Allocations
    BeforeCreate {
        strategy.PrepareForCreate {
            dropDisabledFields
        }
        strategy.Validate
        strategy.Canonicalize
    }
    createValidation (validating webhooks)
    storage Create
    cleanup
    AfterCreate
        Rollback allocations on error
    Decorator
}

/kind bug
/kind cleanup

Fixes #87603
Fixes #98994

The `Service.spec.ipFamilyPolicy` field is now *required* in order to create or update a Service as dual-stack.  This is a breaking change from the beta behavior.  Previously the server would try to infer the value of that field from either `ipFamilies` or `clusterIPs`, but that caused ambiguity on updates.  Users who want a dual-stack Service MUST specify `ipFamilyPolicy` as either "PreferDualStack" or "RequireDualStack".  

cc @khenidak @aojea
FYI @liggitt @deads2k @lavalamp

For followup PR(s):

  • Validation overhaul (TODO comments "validation should not pass with empty clusterIP")
    • lots of things verified twice in update path
    • lots of things not verified at all
    • hard to follow
    • tests should state which fields expect errors
    • tests should have a "base" object + list of mutations
  • Audit strategy code and tests
    • use svctest
  • Test perf (move test boilerplate to outer loops?)
  • rename IPFamilyPolicyType and InternalTraffic. - remove Type suffix
  • rename ServiceExternalTrafficPolicyTypeLocal -> ServiceTrafficPolicyLocal ?
  • rename api.IPv4Protocol
  • Logging cleanup and internal Error for IP failures
  • make DryRun work the same in NodePort as in IP allocator?
  • make dryRun return valid, in-range values ?
  • TODO in ResourceLocation()
  • Other TODOs
  • Scrub allocator code - refactor, renames, comments.
  • testcase for dryrun for IPs, NPs, and HCNP (HCNP missing now). Lean more on {type=LB, ETP=Local} as a case.
  • tests for create-on-update
  • more feature tests

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 18, 2020
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2020
@harche
Copy link
Contributor

harche commented Nov 19, 2020

/test pull-kubernetes-node-crio-e2e

@fedebongio
Copy link
Contributor

/assign @caesarxuchao
would you mind taking a look?
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 19, 2020
@thockin
Copy link
Member Author

thockin commented Nov 20, 2020

@fedebongio @caesarxuchao no need for anyone to look at this yet except morbid curiosity :)

This is detrimental to future `blame` but makes it so much morereadable
I convinced myself it was worthwhile.
This is consistent with every other registry.  Service is no longer the
oddball.
In all the places we pass (old, new) or (new, old), use wrapper-types to
make sure that we don't flip the order by accident.
This was a dumb placeholder name.
Identified in review, these funcs are now more reslient to future
changes.
@thockin
Copy link
Member Author

thockin commented Sep 11, 2021

I have squashed FIXUP commits, updated all commit messages, and pushed - no code changes.

@aojea
Copy link
Member

aojea commented Sep 11, 2021

/test pull-kubernetes-e2e-ipvs-azure-dualstack
/test pull-kubernetes-e2e-iptables-azure-dualstack
/test pull-kubernetes-e2e-kind-dual-canary
/test pull-kubernetes-e2e-kind-ipvs-dual-canary
/lgtm
/hold
holding waiting for CI and Kal
/honk

@k8s-ci-robot
Copy link
Contributor

@aojea:
goose image

In response to this:

/test pull-kubernetes-e2e-ipvs-azure-dualstack
/test pull-kubernetes-e2e-iptables-azure-dualstack
/test pull-kubernetes-e2e-kind-dual-canary
/test pull-kubernetes-e2e-kind-ipvs-dual-canary
/lgtm
/hold
holding waiting for CI and Kal
/honk

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 11, 2021
@khenidak
Copy link
Contributor

/lgtm
/approve
/unhold

here we go.. take cover!

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 11, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khenidak, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@khenidak
Copy link
Contributor

/test pull-kubernetes-e2e-ubuntu-gce-network-policies

@k8s-ci-robot
Copy link
Contributor

@thockin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
pull-kubernetes-bazel-test 61be0b3d3dd884a86248e37579f974f2694422ab link /test pull-kubernetes-bazel-test
pull-kubernetes-bazel-build 61be0b3d3dd884a86248e37579f974f2694422ab link /test pull-kubernetes-bazel-build
pull-kubernetes-e2e-ubuntu-gce-network-policies 009aa36 link /test pull-kubernetes-e2e-ubuntu-gce-network-policies
pull-kubernetes-e2e-iptables-azure-dualstack 009aa36 link /test pull-kubernetes-e2e-iptables-azure-dualstack

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
8 participants