New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DaemonSet controller respects MaxSurge during update #96441
DaemonSet controller respects MaxSurge during update #96441
Conversation
/sig apps |
c6422a6
to
6c69080
Compare
Added |
/assign @kubernetes/sig-apps-pr-reviews |
6c69080
to
ccf4697
Compare
ccf4697
to
2811d52
Compare
2811d52
to
63ae01c
Compare
91fb5f6
to
d3cfd7f
Compare
/test pull-kubernetes-e2e-gce-alpha-features |
Ok, this is ready for review (all known issues are addressed, the test is stable, logging should be accurate and at the right levels). I'm fairly confident that this is in alpha shape at least. |
/remove-sig api-machinery |
/triage accepted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lgtm, leaving final tag to @kow3ns
It is too easy to omit checking the return value for the syncAndValidateDaemonSet test in large suites. Switch the method type to be a test helper and fatal/error directly. Also rename a method that referenced the old name 'Rollback' instead of 'RollingUpdate'.
While this is correct in order of operations, it is harder to read and masks the intent of the user without the parenthesis.
If MaxSurge is set, the controller will attempt to double up nodes up to the allowed limit with a new pod, and then when the most recent (by hash) pod is ready, trigger deletion on the old pod. If the old pod goes unready before the new pod is ready, the old pod is immediately deleted. If an old pod goes unready before a new pod is placed on that node, a new pod is immediately added for that node even past the MaxSurge limit. The backoff clock is used consistently throughout the daemonset controller as an injectable clock for the purposes of testing.
In order to maintain the correct invariants, the existing maxUnavailable logic calculated the same data several times in different ways. Leverage the simpler structure from maxSurge and calculate pod availability only once, as well as perform only a single pass over all the pods in the daemonset. This changed no behavior of the current controller, and has a structure that is almost identical to maxSurge.
The nodeShouldRunDaemonPod method does not need to return an error because there are no scenarios under which it fails. Remove the error return path for its direct calls as well.
d3cfd7f
to
8d8884a
Compare
/refresh |
/refresh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having heard no objections for the past weeks, I'm tagging as is.
/lgtm
/retest
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
3 similar comments
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
If MaxSurge is set, the controller will attempt to launch updated pods on up to MaxSurge nodes and wait for them to be ready before deleting the old pods. If any old pod goes unready during update, a new pod is added to the node (regardless of MaxSurge) and the old pod is deleted.
This implements the logic behind #96375
/kind feature