New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface more information about plugin scores in scheduler #99411
Conversation
@damemi: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damemi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c0a0b45
to
62bf891
Compare
a6f83f0
to
3f54936
Compare
// Summarize all scores. | ||
result := make(framework.NodeScoreList, 0, len(nodes)) | ||
totalPluginScores := make(map[string]int64) | ||
|
||
for i := range nodes { | ||
result = append(result, framework.NodeScore{Name: nodes[i].Name, Score: 0}) | ||
for j := range scoresMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename j to something more meaningful. I think it refers to plugin here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think that this is not a very efficient implementation of nested loops, for 2 reasons:
- we can save lookups by doing:
for plugin, nodeScores := range scoresMap
- Iterating over the map in the outer loop and nodes in the inner loop would keep the slices closer to faster caches in the CPU.
Can you fix this in a separate PR? It would be nice if it shows in any performance benchmarks, but plugins are probably way slower to run. Maybe it will show in some profiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll open another PR for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget this please :)
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened here: #99807
// Summarize all scores. | ||
result := make(framework.NodeScoreList, 0, len(nodes)) | ||
totalPluginScores := make(map[string]int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is adding to the production runtime even if V(4) is not satisfied. You have to do it inside the conditional, even if it repeats code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 so that totalPluginScores[j] += scoresMap[j][i].Score
below will not always costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved totalPluginScores[j] += scoresMap[j][i].Score
under a V(4)
check, but the variable initialization doesn't work if it's put into a conditional context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting you do variable initialization and iteration over scoresMap
inside the big conditional below. To keep all the changes related to logging in a single location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still unresolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think I get what you mean. I moved this to logPluginScores
with its own iteration over scoresMap
. Is that what you meant?
for _, node := range nodes { | ||
nodeToPluginScores[node.Name] = make(framework.NodeScoreList, len(scoresMap)) | ||
} | ||
// Convert the scoresMap (which contains Plugins->NodeScores) to the Nodes->PluginScores map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really have a Nodes->PluginScores map, right?
// Summarize all scores. | ||
result := make(framework.NodeScoreList, 0, len(nodes)) | ||
totalPluginScores := make(map[string]int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 so that totalPluginScores[j] += scoresMap[j][i].Score
below will not always costs.
} | ||
|
||
// Enhanced plugin score logging. The goal of this block is to show the highest scoring plugins on each node, | ||
// and the average score for those plugins across all nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to argue that provide such information is not always helpful for some cases, lower scoring plugins maybe the culprit for why scheduler doesn't choose the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, there are some cases where seeing the lowest-scoring plugins will be more helpful. I chose to go with the top scoring plugins because there are a lot of cases where someone asks, "why didn't the pod go on X node when I have this plugin enabled?" and the answer is because another plugin outscored their desired plugin (usually balanced resource allocation). On the flip side, that also means their plugin scored low. So maybe there is some way to show both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think top scores is almost usually enough. At least if their plugin of interest is missing they can figure that it's scoring low. And if they really want to know everything, they can enable V(10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if they really want to know everything, they can enable V(10)
That's a good point too
@@ -492,7 +535,7 @@ func (g *genericScheduler) prioritizeNodes( | |||
} | |||
} | |||
|
|||
if klog.V(10).Enabled() { | |||
if klog.V(4).Enabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this information matches well with the info that I'm exposing in this PR. The highest scoring plugins themselves are a direct contributor to the total score for that node, so I'm lowering it from V(10)
, which is the level at which we log every score for every plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
741b545
to
9f20ba2
Compare
/retest |
9f20ba2
to
5c2c946
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nits left :)
@@ -492,7 +535,7 @@ func (g *genericScheduler) prioritizeNodes( | |||
} | |||
} | |||
|
|||
if klog.V(10).Enabled() { | |||
if klog.V(4).Enabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
5c2c946
to
d09a841
Compare
Bump, any more feedback on this? |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Often when a pod does not end up on the node a user expects, the instinct is to assume some bug in the scheduler code. In reality, it is likely that other plugins are influencing the scheduling decision by increasing the score of a certain node. While the scheduler currently can log every score for every plugin at
V(10)
, this is usually much more information than is needed at a verbosity that is much higher than practical, especially in high-usage or production clusters. Thus, the key information can be distilled to show:This adds a section to the scheduler's
prioritizeNodes
function that breaks down the top N (currently 3) scoring plugins for each node, and shows that information alongside the average score for each of those plugins across all nodes.The benefit to this is that it is less verbose than what we currently log at
V(10)
(every plugin score for every node), so it can be shown at a lower log level while still providing insights into scheduling decisions.The output looks like this:
Which issue(s) this PR fixes:
Ref #91633
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/sig scheduling