Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nodelifecyle controller not add NoExecute taint bug #96876

Merged
merged 1 commit into from Jan 13, 2021

Conversation

howieyuen
Copy link
Member

@howieyuen howieyuen commented Nov 26, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:
this PR #89059 try to fix reconcile problem, so every 5s monitorNodeHealth() run processTaintBaseEviction(), add nodes to zoneNoExecuteTainter cause these nodes' status is Unknown or False.

However, every time we need add untainted nodes to RateLimitedTimedQueue, this PR try to delete it first in order to enter queue every time. Delete action use a additional func SetRemve() as below instead of Remove():

func (q *UniqueQueue) SetRemove(value string) {
	q.lock.Lock()
	defer q.lock.Unlock()
        // *************************************************
        // only delete set value, leave queue data behind
        // *************************************************
	if q.set.Has(value) {
		q.set.Delete(value)
	}
}

func (q *UniqueQueue) Remove(value string) bool {
	q.lock.Lock()
	defer q.lock.Unlock()

	if !q.set.Has(value) {
		return false
	}
	q.set.Delete(value)
	for i, val := range q.queue {
		if val.Value == value {
			heap.Remove(&q.queue, i)
			return true
		}
	}
	return true
}

When taintManager start working(doNoExecuteTaintingPass()) and its QPS defaults as 0.1, so here is a scenario may case nodes will never get NoExecute taint except kube-controller-manager restart and reconstruct its queue data:

  1. Suppose there are 3 nodes' status are unknown, and monitor these 3 node, add them in queue, the UniqueQueue inside RateLimitedTimedQueue looks like this:
queue TimedValue{node0, id0} TimedValue{node1, id1} TimedValue{node2, id2}
set node0 node1 node2
  1. doNoExecuteTaintingPass() not finish the taint job, and monitNodeHealth() run in next period, and enqueue 3 nodes again with set data removed but queue data left, and the UniqueQueue inside RateLimitedTimedQueue looks like this:
queue TimedValue{node0, id0}(dirty data) TimedValue{node1, id1}(dirty data) TimedValue{node2, id2}(dirty data) TimedValue{node0, id0} TimedValue{node1, id1} TimedValue{node2, id2}
set node0 node1 node2 node0 node1 node2
  1. doNoExecuteTaintingPass() continue to deal with these untainted nodes, this func fetch data from queue not set one by one, start with dirty data, suppose that before handling the duplicated data "node0", node0 return normal (same as node1 and node2), so ActionFunc() in nc.zoneNoExecuteTainter[k].Try(fn ActionFunc) returns true, and func Try() calls q.queue.RemoveFromQueue(val.Value), but it cannot be removed because the set value is not existed. So queue's head cannot be removed normally, next running circle still get the dirty data, and the taint job go stuck forever
func (q *UniqueQueue) RemoveFromQueue(value string) bool {
	q.lock.Lock()
	defer q.lock.Unlock()

	if !q.set.Has(value) {
		return false
	}
	for i, val := range q.queue {
		if val.Value == value {
			heap.Remove(&q.queue, i)
			return true
		}
	}
	return false
}

Which issue(s) this PR fixes:
Fix: #94183 #96183

Special notes for your reviewer:
I write a helper func to print values inside RateLimitedTimedQueue, and unit test running log as below, and log display the dirty data inside queue field.

before set SetRemove() to Remove()

TestApplyNoExecuteTaintsToNodesEnqueueTwice
=== RUN   TestApplyNoExecuteTaintsToNodesEnqueueTwice
I1126 20:27:44.590489   60621 node_lifecycle_controller.go:380] Sending events to api server.
I1126 20:27:44.591383   60621 taint_manager.go:163] Sending events to api server.
I1126 20:27:44.591550   60621 node_lifecycle_controller.go:508] Controller will reconcile labels.
I1126 20:27:44.591803   60621 node_lifecycle_controller.go:1428] Initializing eviction metric for zone: region1::zone1
W1126 20:27:44.591895   60621 node_lifecycle_controller.go:1043] Missing timestamp for Node node1. Assuming now as a timestamp.
W1126 20:27:44.592092   60621 node_lifecycle_controller.go:1043] Missing timestamp for Node node2. Assuming now as a timestamp.
W1126 20:27:44.592187   60621 node_lifecycle_controller.go:1043] Missing timestamp for Node node0. Assuming now as a timestamp.
I1126 20:27:44.592273   60621 node_lifecycle_controller.go:1244] Controller detected that zone region1::zone1 is now in state Normal.
----------------------------------dirty data-----------------------------------
q.queue: node2,node0,node2,node0,
q.set: map[node0:{} node2:{}]
----------------------------------dirty data-----------------------------------
W1126 20:27:44.592663   60621 node_lifecycle_controller.go:1043] Missing timestamp for Node node3. Assuming now as a timestamp.
W1126 20:27:44.592753   60621 node_lifecycle_controller.go:1043] Missing timestamp for Node node4. Assuming now as a timestamp.
W1126 20:27:44.592817   60621 node_lifecycle_controller.go:1043] Missing timestamp for Node node5. Assuming now as a timestamp.
----------------------------------dirty data-----------------------------------
q.queue: node0,node0,node2,node5,node3,node0,
q.set: map[node0:{} node3:{} node5:{}]
map[node0:{} node3:{} node5:{}]
node0 true
map[node0:{} node3:{} node5:{}]
node2 false
----------------------------------dirty data-----------------------------------
    node_lifecycle_controller_test.go:291: Not found taint &Taint{Key:node.kubernetes.io/unreachable,Value:,Effect:NoExecute,TimeAdded:<nil>,} in [], which should be present in node3
    node_lifecycle_controller_test.go:299: Not found taint &Taint{Key:node.kubernetes.io/not-ready,Value:,Effect:NoExecute,TimeAdded:<nil>,} in [], which should be present in node5
--- FAIL: TestApplyNoExecuteTaintsToNodesEnqueueTwice (0.01s)
FAIL
FAIL	k8s.io/kubernetes/pkg/controller/nodelifecycle	0.030s
FAIL

After:

=== RUN   TestApplyNoExecuteTaintsToNodesEnqueueTwice
I1126 20:27:56.167537   36972 node_lifecycle_controller.go:380] Sending events to api server.
I1126 20:27:56.167790   36972 taint_manager.go:163] Sending events to api server.
I1126 20:27:56.167879   36972 node_lifecycle_controller.go:508] Controller will reconcile labels.
I1126 20:27:56.167988   36972 node_lifecycle_controller.go:1429] Initializing eviction metric for zone: region1::zone1
W1126 20:27:56.168036   36972 node_lifecycle_controller.go:1044] Missing timestamp for Node node0. Assuming now as a timestamp.
W1126 20:27:56.168097   36972 node_lifecycle_controller.go:1044] Missing timestamp for Node node1. Assuming now as a timestamp.
W1126 20:27:56.168139   36972 node_lifecycle_controller.go:1044] Missing timestamp for Node node2. Assuming now as a timestamp.
I1126 20:27:56.168184   36972 node_lifecycle_controller.go:1245] Controller detected that zone region1::zone1 is now in state Normal.
----------------------------------clean data-----------------------------------
q.queue: node0,node2,
q.set: map[node0:{} node2:{}]
----------------------------------clean data-----------------------------------
W1126 20:27:56.168407   36972 node_lifecycle_controller.go:1044] Missing timestamp for Node node3. Assuming now as a timestamp.
W1126 20:27:56.168447   36972 node_lifecycle_controller.go:1044] Missing timestamp for Node node4. Assuming now as a timestamp.
W1126 20:27:56.168488   36972 node_lifecycle_controller.go:1044] Missing timestamp for Node node5. Assuming now as a timestamp.
----------------------------------clean data-----------------------------------
q.queue: node2,node3,node5,
q.set: map[node2:{} node3:{} node5:{}]
node2 true
node3 true
node5 true
----------------------------------clean data-----------------------------------
--- PASS: TestApplyNoExecuteTaintsToNodesEnqueueTwice (0.00s)
PASS
ok  	k8s.io/kubernetes/pkg/controller/nodelifecycle	0.027s

Does this PR introduce a user-facing change?

fixing a regression in 1.19+ where a failed node may not have the NoExecute taint set correctly

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 26, 2020
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 26, 2020
@JornShen
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Nov 26, 2020
@howieyuen
Copy link
Member Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 26, 2020
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 26, 2020
@howieyuen howieyuen force-pushed the no-execute-taint-missing branch 3 times, most recently from 0c3a714 to c4fd122 Compare November 26, 2020 10:09
@howieyuen
Copy link
Member Author

/retest

@k8s-ci-robot
Copy link
Contributor

@howieyuen: The label(s) area/kube-controller-manager cannot be applied, because the repository doesn't have them

In response to this:

/area kube-controller-manager
/milestone v1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@howieyuen: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

In response to this:

/area kube-controller-manager
/milestone v1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@howieyuen
Copy link
Member Author

/area node-lifecycle

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Jan 8, 2021
@ehashman ehashman moved this from Triage to Needs Reviewer in SIG Node PR Triage Jan 8, 2021
@derekwaynecarr
Copy link
Member

thank you for the detail.

i have to admit the issue upon review was hard to follow without it, so the test and detail are much appreciated!

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, howieyuen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 13, 2021
@k8s-ci-robot k8s-ci-robot merged commit 1209c59 into kubernetes:master Jan 13, 2021
SIG Node PR Triage automation moved this from Needs Reviewer to Done Jan 13, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jan 13, 2021
CKchen0726 pushed a commit to CKchen0726/kubernetes that referenced this pull request Jan 19, 2021
…issing

fix nodelifecyle controller not add NoExecute taint bug
CKchen0726 pushed a commit to CKchen0726/kubernetes that referenced this pull request Jan 20, 2021
…issing

fix nodelifecyle controller not add NoExecute taint bug
@LuckySB
Copy link

LuckySB commented Jan 22, 2021

1.18.12 has this issue too, can you cherry-pick to 1.18 ?

@pacoxu
Copy link
Member

pacoxu commented Feb 21, 2021

Not sure if there's a reason that cherry-picks prs are not merged to 1.18-1.20.
/cc

@k8s-ci-robot k8s-ci-robot requested review from pacoxu and removed request for gmarek February 21, 2021 14:41
k8s-ci-robot added a commit that referenced this pull request Mar 5, 2021
Cherry pick #96876 in controller to 1.18: fix nodelifecyle controller not add NoExecute taint bug
k8s-ci-robot added a commit that referenced this pull request Mar 5, 2021
Cherry pick #96876 in controller to 1.20: fix nodelifecyle controller not add NoExecute taint bug
k8s-ci-robot added a commit that referenced this pull request Mar 5, 2021
Cherry pick #96876 in controller to 1.19: fix nodelifecyle controller not add NoExecute taint bug
@liggitt liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/node-lifecycle Issues or PRs related to Node lifecycle cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

nodelifecycle controller does not set NoExecute taint to NotReady node when kubelet has been stopped