Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webhook config manager: HasSynced returns true when the manager is synced with existing webhookconfig objects at startup #95783

Merged

Conversation

roycaihw
Copy link
Member

@roycaihw roycaihw commented Oct 22, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:
Fix a webhook unittest flake. The test waits for informers to be synced, which guarantees the watch events being sent (the delta fifo queue in the informer has no initial object left), but it doesn't guarantee the webhook configuration manager finishing processing those watch events. Adding a marker to wait for webhook registration, like what we do in the e2e tests.

The original race can be reliably reproduced by adding a sleep to the configuration managers.

What happened: webhook config manager HasSynced() function returned true if the informer was synced, which could cause requests being allowed without existing webhook configurations being effective at startup.

[EDIT] Thanks to @liggitt who pointed out that this isn't just a test issue and suggested the proper fix-- on API server startup, the webhook waits to admit anything until it is synced with existing webhookconfig objects, meaning validatingWebhookConfigurationManager#HasSynced and mutatingWebhookConfigurationManager#HasSynced shouldn't return true until the informer is synced and either has no items or updateConfiguration() has completed. This fix guarantees that at no point should requests be allowed into a restarted API server without those previously created webhook configurations being effective.

Which issue(s) this PR fixes:

Fixes #94810, #99911, #100070

Does this PR introduce a user-facing change?:

Fixed a race condition on API server startup ensuring previously created webhook configurations are effective before the first write request is admitted.

/sig api-machinery
/cc @sttts

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver labels Oct 22, 2020
@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 22, 2020
@BenTheElder
Copy link
Member

/cc @liggitt

@roycaihw roycaihw force-pushed the flake/wait-for-webhook-registration branch 2 times, most recently from d3a21d8 to e124292 Compare March 17, 2021 21:43
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Mar 17, 2021
@roycaihw roycaihw changed the title webhook unittest: wait for webhook registration before running test cases webhook config manager: HasSynced returns true when the manager is synced with existing webhookconfig objects at startup Mar 17, 2021
@roycaihw roycaihw force-pushed the flake/wait-for-webhook-registration branch from e124292 to e0bc203 Compare March 18, 2021 16:50
@roycaihw
Copy link
Member Author

/retest

@liggitt
Copy link
Member

liggitt commented Mar 18, 2021

one comment on making this more efficient for servers without any webhooks configured, functionally lgtm otherwise.

…nced with existing webhookconfig objects at startup
@roycaihw roycaihw force-pushed the flake/wait-for-webhook-registration branch from e0bc203 to 37d171e Compare March 19, 2021 19:40
@roycaihw
Copy link
Member Author

/retest

@liggitt
Copy link
Member

liggitt commented Mar 23, 2021

/lgtm
/approve

/milestone v1.21
Fixes 3 of the top 10 unit test flakes

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 23, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 23, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, pacoxu, roycaihw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2021
@k8s-ci-robot k8s-ci-robot merged commit ae6ceaa into kubernetes:master Mar 23, 2021
@roycaihw roycaihw deleted the flake/wait-for-webhook-registration branch March 23, 2021 17:28
k8s-ci-robot added a commit that referenced this pull request Mar 23, 2021
…783-upstream-release-1.19

Automated cherry pick of #95783: webhook config manager: HasSynced returns true when the
k8s-ci-robot added a commit that referenced this pull request Mar 23, 2021
…783-upstream-release-1.20

Automated cherry pick of #95783: webhook config manager: HasSynced returns true when the
k8s-ci-robot added a commit that referenced this pull request Mar 23, 2021
…783-upstream-release-1.18

Automated cherry pick of #95783: webhook config manager: HasSynced returns true when the
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[flaky unit test] plugin/webhook/validating
6 participants