New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logarithmic timestamp comparison for downscaling #99212
Logarithmic timestamp comparison for downscaling #99212
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @alculquicondor
d70ae93
to
088c236
Compare
/retest |
088c236
to
d6ec6a6
Compare
You forgot to add the KEP to the description :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing the feature gate
@@ -808,6 +809,9 @@ func (s ActivePods) Less(i, j int) bool { | |||
// 7. If the pods' creation times differ, the pod that was created more recently | |||
// comes before the older pod. | |||
// | |||
// In 5 and 7, times are compared in a logarithmic scale. This allows a level | |||
// of randomness among equivalent Pods when sorting. | |||
// | |||
// If none of these rules matches, the second pod comes before the first pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the UUID comparison that was suggested in the KEP?
@@ -494,6 +495,45 @@ func TestSpecReplicasChange(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestLogarithmicScaleDown(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is ensuring that the existing behavior somewhat still holds. That is good. We should run it with the feature gate enabled and disabled (I would remove the Logarithmic
part from the test name)
Another test we could have is rather opposite: have all the pods run at roughly the same time, downscale, and see if some level of randomness holds. Do you think something like this is possible? Maybe we can run the same test X times and ensure that at least in one of them, the removed pod was not the absolute youngest.
In the test plan we also suggested emulating the story https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/2185-random-pod-select-on-replicaset-downscale#test-plan
We can do that in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have all the pods run at roughly the same time, downscale, and see if some level of randomness holds ... Maybe we can run the same test X times
There is an inherent problem with randomness in that we can never be certain it will be the randomness we expect :) Something like this will introduce flakes, which can be minimized by increasing the number of test runs, but still always possible. Are there examples of other tests which take a similar approach?
What we really wish to show from such a test is that pods created on a similar logarithmic time scale have an equal chance of being chosen for downscaling. In other words, that their ranks are calculated properly. I believe the unit tests cover this level of detail sufficiently.
The integration test is simply showing that the basic logic still works when removed from the vacuum of unit tests. I think that the user story you linked will make a good e2e, and I could actually add that to this PR (I don't see any reason to wait, and that will flesh out test cases here more comfortably). Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the unit tests cover this level of detail sufficiently.
I agree, but thanks for giving it a thought.
Still, modify the test to run for feature gate enabled and disabled.
Fine with me to add the 2e2 test for the user story in this PR.
d6ec6a6
to
d99be17
Compare
@alculquicondor I added the UID sorting right before the call to actually sort the pods in 5ffacd0a98bb855bcbe067b2560a6f56c8775254. Let me know if that looks good, it should at least give a pseudo-random start before pods of similar ranks are sorted. I am still working on how to add the e2e. I'm thinking these steps:
My goal is to create 2 groups of pods, spaced far enough apart that they will have different ranks. However I add the 30s waits to reduce flakes caused by latency (in other words, the difference between a base-2 rank What do you think of this approach? I want to avoid adding too much "scheduling" dependency into the test (ie, assuming that pods will be evenly distributed among nodes) because that introduces flakes unless the nodes are evenly balanced and I think it's irrelevant to this anyway. |
@@ -802,6 +802,10 @@ func getPodsToDelete(filteredPods, relatedPods []*v1.Pod, diff int) []*v1.Pod { | |||
// diff will always be <= len(filteredPods), so not need to handle > case. | |||
if diff < len(filteredPods) { | |||
podsWithRanks := getPodsRankedByRelatedPodsOnSameNode(filteredPods, relatedPods) | |||
// First obtain a pseudo-random ordering by sorting by pod UID | |||
sort.Slice(podsWithRanks.Pods, func(i, j int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just add the comparison in the existing Less
? I feel like that should perform better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that come before or after the timestamp comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a comparison criteria with the lowest priority, so after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case I should also update logarithmicOrAfterZero
to check if the 2 ranks are equal (and continue), rather than just returning... or I can add the UID check right into logarithmicOrAfterZero
. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to be explicit about UIDs by having that comparison outside of timestamp comparison.
Re: E2E I think we should do what was suggested in the KEP: simulate the scenario from the story and make sure spreading is good, with some allowance for skew. |
c43188e
to
469e24d
Compare
30808c2
to
bb80f72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good for squash
bb80f72
to
d5054e2
Compare
/assign @kow3ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Some minor nits.
pkg/controller/controller_utils.go
Outdated
@@ -807,6 +808,10 @@ func (s ActivePods) Less(i, j int) bool { | |||
// 8. If the pods' creation times differ, the pod that was created more recently | |||
// comes before the older pod. | |||
// | |||
// In 5 and 8, times are compared in a logarithmic scale. This allows a level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// In 5 and 8, times are compared in a logarithmic scale. This allows a level | |
// In 6 and 8, times are compared in a logarithmic scale. This allows a level |
pkg/controller/controller_utils.go
Outdated
if rankDiff > 0 { | ||
return false | ||
} | ||
return s.Pods[i].UID < s.Pods[j].UID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is identical for both pieces, how about making this a function rather than repeating the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to keep the UID comparison in this method, for visibility. But a shorter alternative would be
diff := logarithmicRankDiff(s.Pods[i].CreationTimestamp, s.Pods[j].CreationTimestamp, s.Now)
if diff == 0 {
return s.Pods[i].UID < s.Pods[j].UID
}
return diff < 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I did it my way to keep the UID comparison physically last since it has the lowest priority. But if this is cleaner, we can do that too
r2 = -1 | ||
} else { | ||
r2 = int64(math.Log2(float64(d2))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r1 := int64(-1)
r2 := int64(-1)
if d1 > 0 {
r1 = int64(math.Log2(float64(d1)))
}
if d2 > 0 {
r2 = int64(math.Log2(float64(d2)))
}
Seems simpler, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
71c0029
to
15fc96f
Compare
Thanks for the review @soltysh. I pushed and rebased with the feedback, and also added the feature gate @alculquicondor pointed out from #99212 (review). Please let me know if there is anything else I need to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to enable the feature gate for the tests.
I had long forgotten about the gate 😂
Change-Id: I0657ea0ce41b98fdee1a5307b5826a10deaff98c
15fc96f
to
a8d105a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/triage accepted
/priority backlog
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damemi, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implements logarithmically-compared timestamps for replica scale downs, from #96898:
Which issue(s) this PR fixes:
Ref kubernetes/enhancements#2185 and #96748
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: