Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip check for all topology labels when using system default spreading #105046

Merged
merged 1 commit into from Sep 16, 2021

Conversation

alculquicondor
Copy link
Member

@alculquicondor alculquicondor commented Sep 15, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

Skip check for all topology labels when using system default spreading

Checking for all topology labels is not backwards compatible. Clusters were nodes don't have zone labels effectively have default spreading disabled.

Change only applies to system defaults.

/sig scheduling

Which issue(s) this PR fixes:

Fixes #102136

Special notes for your reviewer:

This replaces #102383 (the contributor abandoned).

One thing to note is the behavior when some nodes have zones. The scoring would favor nodes with no zone. We should consider this as undefined behavior.

Does this PR introduce a user-facing change?

Fix regression in 1.19+ in scheduler system default topology spreading when nodes don't have zone labels. Pods correctly spread by default now.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 15, 2021
@alculquicondor
Copy link
Member Author

/assign @Huang-Wei

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 15, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 15, 2021
Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits. LGTM overall.

Comment on lines +453 to +456
st.MakeNode().Name("node-a").Label(v1.LabelHostname, "node-a").Obj(),
st.MakeNode().Name("node-b").Label(v1.LabelHostname, "node-b").Obj(),
st.MakeNode().Name("node-c").Label(v1.LabelHostname, "node-c").Obj(),
st.MakeNode().Name("node-d").Label(v1.LabelHostname, "node-d").Obj(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For curiosity: in addition to zone label, should we consider the case that all or partial nodes don't carry standard host label?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that even possible? It seems that kubelet takes care of it https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetesiohostname

Interestingly, given that we special-case hostname to do the count in Score (instead of PreScore), we could make it work. However, I don't think we need to support that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that even possible?

I was just curious. Personally I'm with you to believe it's sane defaults for every cluster.

Checking for all topology labels is not backwards compatible. Clusters were nodes don't have zone labels effectively have default spreading disabled.

Change only applies to system defaults.
@Huang-Wei
Copy link
Member

/retest
/lgtm

@cpanato
Copy link
Member

cpanato commented Nov 23, 2021

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 23, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Nov 23, 2021
k8s-ci-robot added a commit that referenced this pull request Nov 23, 2021
…of-#105046-upstream-release-1.22

Automated cherry pick of #105046: Skip check for all topology labels when using system default
k8s-ci-robot added a commit that referenced this pull request Nov 23, 2021
…of-#105046-upstream-release-1.20

Automated cherry pick of #105046: Skip check for all topology labels when using system default
k8s-ci-robot added a commit that referenced this pull request Nov 23, 2021
…of-#105046-upstream-release-1.21

Automated cherry pick of #105046: Skip check for all topology labels when using system default
@liggitt liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"scheduler: non-compatible change in default topology spread constraints"
5 participants