Surface more information about plugin scores in scheduler #99411

damemi · 2021-02-24T18:26:11Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Often when a pod does not end up on the node a user expects, the instinct is to assume some bug in the scheduler code. In reality, it is likely that other plugins are influencing the scheduling decision by increasing the score of a certain node. While the scheduler currently can log every score for every plugin at V(10), this is usually much more information than is needed at a verbosity that is much higher than practical, especially in high-usage or production clusters. Thus, the key information can be distilled to show:

what plugins were most influential in this scheduling decision
how those plugins compare to other influential plugins
why one node's score may be less than another's even under assumed ideal conditions

This adds a section to the scheduler's prioritizeNodes function that breaks down the top N (currently 3) scoring plugins for each node, and shows that information alongside the average score for each of those plugins across all nodes.

The benefit to this is that it is less verbose than what we currently log at V(10) (every plugin score for every node), so it can be shown at a lower log level while still providing insights into scheduling decisions.

The output looks like this:

$ kubectl create -f ~/demo-pod.yaml
pod/mypodcn84p created

$ kubectl logs pod/kube-scheduler-kind-control-plane2 -n kube-system -c kube-scheduler | grep mypodcn84p
I0224 21:04:52.359490       1 eventhandlers.go:164] "Add event for unscheduled pod" pod="default/mypodcn84p"
I0224 21:04:53.475906       1 scheduling_queue.go:805] "About to try and schedule pod" pod="default/mypodcn84p"
I0224 21:04:53.475919       1 scheduler.go:457] "Attempting to schedule pod" pod="default/mypodcn84p"
I0224 21:04:53.479045       1 generic_scheduler.go:490] "Top 3 plugins for pod on node" pod="default/mypodcn84p" node="kind-worker2" scores=[{Plugin:NodePreferAvoidPods Score:1000000 AverageScore:1e+06} {Plugin:PodTopologySpread Score:200 AverageScore:200} {Plugin:InterPodAffinity Score:100 AverageScore:52.333333333333336}]
I0224 21:04:53.479440       1 generic_scheduler.go:490] "Top 3 plugins for pod on node" pod="default/mypodcn84p" node="kind-worker" scores=[{Plugin:NodePreferAvoidPods Score:1000000 AverageScore:1e+06} {Plugin:PodTopologySpread Score:200 AverageScore:200} {Plugin:TaintToleration Score:100 AverageScore:66.66666666666667}]
I0224 21:04:53.479495       1 generic_scheduler.go:490] "Top 3 plugins for pod on node" pod="default/mypodcn84p" node="kind-worker3" scores=[{Plugin:NodePreferAvoidPods Score:1000000 AverageScore:1e+06} {Plugin:PodTopologySpread Score:200 AverageScore:200} {Plugin:TaintToleration Score:100 AverageScore:66.66666666666667}]
I0224 21:04:53.479508       1 generic_scheduler.go:540] "Calculated node's final score for pod" pod="default/mypodcn84p" node="kind-worker2" score=1000486
I0224 21:04:53.479914       1 generic_scheduler.go:540] "Calculated node's final score for pod" pod="default/mypodcn84p" node="kind-worker" score=1000554
I0224 21:04:53.479949       1 generic_scheduler.go:540] "Calculated node's final score for pod" pod="default/mypodcn84p" node="kind-worker3" score=1000529
I0224 21:04:53.480367       1 default_binder.go:51] "Attempting to bind pod to node" pod="default/mypodcn84p" node="kind-worker"
I0224 21:04:53.492108       1 eventhandlers.go:201] "Delete event for unscheduled pod" pod="default/mypodcn84p"
I0224 21:04:53.492157       1 eventhandlers.go:221] "Add event for scheduled pod" pod="default/mypodcn84p"
I0224 21:04:53.510055       1 scheduler.go:602] "Successfully bound pod to node" pod="default/mypodcn84p" node="kind-worker" evaluatedNodes=6 feasibleNodes=3

Which issue(s) this PR fixes:

Ref #91633

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kube-scheduler now logs plugin scoring summaries at --v=4

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/sig scheduling

k8s-ci-robot · 2021-02-24T18:26:17Z

@damemi: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-02-24T18:26:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damemi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [damemi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alculquicondor · 2021-02-24T21:38:11Z

pkg/scheduler/core/generic_scheduler.go

 	// Summarize all scores.
 	result := make(framework.NodeScoreList, 0, len(nodes))
+	totalPluginScores := make(map[string]int64)

 	for i := range nodes {
 		result = append(result, framework.NodeScore{Name: nodes[i].Name, Score: 0})
 		for j := range scoresMap {


please rename j to something more meaningful. I think it refers to plugin here?

Come to think that this is not a very efficient implementation of nested loops, for 2 reasons:

we can save lookups by doing:
for plugin, nodeScores := range scoresMap

Iterating over the map in the outer loop and nodes in the inner loop would keep the slices closer to faster caches in the CPU.

Can you fix this in a separate PR? It would be nice if it shows in any performance benchmarks, but plugins are probably way slower to run. Maybe it will show in some profiles.

Yeah, I'll open another PR for that

Don't forget this please :)

/lgtm

Opened here: #99807

alculquicondor · 2021-02-24T21:38:55Z

pkg/scheduler/core/generic_scheduler.go

 	// Summarize all scores.
 	result := make(framework.NodeScoreList, 0, len(nodes))
+	totalPluginScores := make(map[string]int64)


this is adding to the production runtime even if V(4) is not satisfied. You have to do it inside the conditional, even if it repeats code.

+1 so that totalPluginScores[j] += scoresMap[j][i].Score below will not always costs.

I moved totalPluginScores[j] += scoresMap[j][i].Score under a V(4) check, but the variable initialization doesn't work if it's put into a conditional context

I was suggesting you do variable initialization and iteration over scoresMap inside the big conditional below. To keep all the changes related to logging in a single location.

this is still unresolved

Oh, I think I get what you mean. I moved this to logPluginScores with its own iteration over scoresMap. Is that what you meant?

pkg/scheduler/core/generic_scheduler.go

chendave · 2021-02-25T09:57:45Z

pkg/scheduler/core/generic_scheduler.go

+		for _, node := range nodes {
+			nodeToPluginScores[node.Name] = make(framework.NodeScoreList, len(scoresMap))
+		}
+		// Convert the scoresMap (which contains Plugins->NodeScores) to the Nodes->PluginScores map


we don't really have a Nodes->PluginScores map, right?

chendave · 2021-02-25T10:00:30Z

pkg/scheduler/core/generic_scheduler.go

 	// Summarize all scores.
 	result := make(framework.NodeScoreList, 0, len(nodes))
+	totalPluginScores := make(map[string]int64)


+1 so that totalPluginScores[j] += scoresMap[j][i].Score below will not always costs.

chendave · 2021-02-25T10:18:28Z

pkg/scheduler/core/generic_scheduler.go

+	}
+
+	// Enhanced plugin score logging. The goal of this block is to show the highest scoring plugins on each node,
+	// and the average score for those plugins across all nodes.


I want to argue that provide such information is not always helpful for some cases, lower scoring plugins maybe the culprit for why scheduler doesn't choose the node.

I agree, there are some cases where seeing the lowest-scoring plugins will be more helpful. I chose to go with the top scoring plugins because there are a lot of cases where someone asks, "why didn't the pod go on X node when I have this plugin enabled?" and the answer is because another plugin outscored their desired plugin (usually balanced resource allocation). On the flip side, that also means their plugin scored low. So maybe there is some way to show both?

I think top scores is almost usually enough. At least if their plugin of interest is missing they can figure that it's scoring low. And if they really want to know everything, they can enable V(10)

And if they really want to know everything, they can enable V(10)

That's a good point too

pkg/scheduler/core/generic_scheduler.go

chendave · 2021-02-25T10:23:56Z

pkg/scheduler/core/generic_scheduler.go

@@ -492,7 +535,7 @@ func (g *genericScheduler) prioritizeNodes(
 		}
 	}

-	if klog.V(10).Enabled() {
+	if klog.V(4).Enabled() {


why change this?

I think this information matches well with the info that I'm exposing in this PR. The highest scoring plugins themselves are a direct contributor to the total score for that node, so I'm lowering it from V(10), which is the level at which we log every score for every plugin.

damemi · 2021-03-02T17:52:03Z

/retest

alculquicondor

just nits left :)

alculquicondor · 2021-03-02T18:52:57Z

pkg/scheduler/core/generic_scheduler.go

@@ -492,7 +535,7 @@ func (g *genericScheduler) prioritizeNodes(
 		}
 	}

-	if klog.V(10).Enabled() {
+	if klog.V(4).Enabled() {


pkg/scheduler/core/generic_scheduler.go

damemi · 2021-03-04T20:10:03Z

Bump, any more feedback on this?

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 24, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 24, 2021

k8s-ci-robot requested review from alculquicondor and chendave February 24, 2021 18:27

damemi force-pushed the scheduler-score-logging branch from c0a0b45 to 62bf891 Compare February 24, 2021 18:28

damemi changed the title ~~[WIP] Surface more information about plugin scores in scheduler~~ Surface more information about plugin scores in scheduler Feb 24, 2021

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 24, 2021

damemi force-pushed the scheduler-score-logging branch 2 times, most recently from a6f83f0 to 3f54936 Compare February 24, 2021 20:57

alculquicondor reviewed Feb 24, 2021

View reviewed changes

chendave reviewed Feb 25, 2021

View reviewed changes

damemi force-pushed the scheduler-score-logging branch 2 times, most recently from 741b545 to 9f20ba2 Compare February 25, 2021 17:57

damemi force-pushed the scheduler-score-logging branch from 9f20ba2 to 5c2c946 Compare March 2, 2021 18:35

alculquicondor reviewed Mar 2, 2021

View reviewed changes

Surface more information about plugin scores in scheduler

d09a841

damemi force-pushed the scheduler-score-logging branch from 5c2c946 to d09a841 Compare March 2, 2021 19:02

k8s-ci-robot assigned alculquicondor Mar 4, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2021

k8s-ci-robot merged commit 05988d7 into kubernetes:master Mar 4, 2021

k8s-ci-robot added this to the v1.21 milestone Mar 4, 2021

This was referenced Mar 7, 2021

Scheduler throughput drops significantly #99913

Closed

Revert "Surface more information about plugin scores in scheduler" #99914

Merged

damemi mentioned this pull request Jun 21, 2021

More debug logs to check scheduler spread algorithm #101820

Closed

muma378 mentioned this pull request Jul 28, 2021

Add verbose logs for node/plugin scores even ranged in low levels #103515

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface more information about plugin scores in scheduler #99411

Surface more information about plugin scores in scheduler #99411

damemi commented Feb 24, 2021 •

edited

k8s-ci-robot commented Feb 24, 2021

k8s-ci-robot commented Feb 24, 2021

alculquicondor Feb 24, 2021

alculquicondor Feb 24, 2021

damemi Feb 25, 2021

alculquicondor Mar 4, 2021

damemi Mar 4, 2021

alculquicondor Feb 24, 2021

chendave Feb 25, 2021

damemi Feb 25, 2021

alculquicondor Feb 25, 2021

alculquicondor Mar 2, 2021

damemi Mar 2, 2021

chendave Feb 25, 2021

chendave Feb 25, 2021

chendave Feb 25, 2021

damemi Feb 25, 2021

alculquicondor Feb 25, 2021

damemi Feb 25, 2021

chendave Feb 25, 2021

damemi Feb 25, 2021

alculquicondor Mar 2, 2021

damemi commented Mar 2, 2021

alculquicondor left a comment

alculquicondor Mar 2, 2021

damemi commented Mar 4, 2021

Surface more information about plugin scores in scheduler #99411

Surface more information about plugin scores in scheduler #99411

Conversation

damemi commented Feb 24, 2021 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Feb 24, 2021

k8s-ci-robot commented Feb 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi commented Mar 2, 2021

alculquicondor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi commented Mar 4, 2021

damemi commented Feb 24, 2021 •

edited