kubeadm: do not check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight #104942

SataQiu · 2021-09-12T14:55:30Z

What type of PR is this?

/kind feature
/kind bug

What this PR does / why we need it:

kubeadm: do not check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#2564

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kubeadm: do not check if the '/etc/kubernetes/manifests' folder is empty on joining worker nodes during preflight

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

k8s-ci-robot · 2021-09-12T14:56:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SataQiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kubeadm/OWNERS~~ [SataQiu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pacoxu

LGTM
It makes sense.
/cc neolit123

pacoxu · 2021-09-13T01:54:43Z

/assign fabriziopandini
Per kubernetes/kubeadm#2564 (comment)

fabriziopandini

@SataQiu thanks for this PR!
Q: as per kubernetes/kubeadm#2564 (comment), should we also make this checks consistent between init and join?

neolit123 · 2021-09-13T14:39:29Z

cmd/kubeadm/app/preflight/checks.go

+		FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeAPIServer, manifestsDir)},
+		FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeControllerManager, manifestsDir)},
+		FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeScheduler, manifestsDir)},
+		FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.Etcd, manifestsDir)},


~~these checks are control plane checks so it makes more sense to have them under cfg.ControlPlane != nil below.~~

but something else to note here...looking at phases/join/preflight.go RunInitNodeChecks is already called for control plane nodes:

kubernetes/cmd/kubeadm/app/cmd/phases/join/preflight.go

Lines 104 to 123 in 718f1b6

if j.Cfg().ControlPlane != nil {

// Checks if the cluster configuration supports

// joining a new control plane instance and if all the necessary certificates are provided

hasCertificateKey := len(j.CertificateKey()) > 0

if err := checkIfReadyForAdditionalControlPlane(&initCfg.ClusterConfiguration, hasCertificateKey); err != nil {

// outputs the not ready for hosting a new control plane instance message

ctx := map[string]string{

"Error": err.Error(),

}

var msg bytes.Buffer

notReadyToJoinControlPlaneTemp.Execute(&msg, ctx)

return errors.New(msg.String())

}

// run kubeadm init preflight checks for checking all the prerequisites

fmt.Println("[preflight] Running pre-flight checks before initializing the new control plane instance")

if err := preflight.RunInitNodeChecks(utilsexec.New(), initCfg, j.IgnorePreflightErrors(), true, hasCertificateKey); err != nil {

return err

i.e. secondary control-plane nodes already should have the same checks as the first control plane node (mostly).

one difference is the isSecondaryControlPlane flag which is true on join.

so i think the correct patch is to just remove this check:

DirAvailableCheck{Path: filepath.Join(kubeadmconstants.KubernetesDir, kubeadmconstants.ManifestsSubDirName)},

on joining worker nodes.

Thanks for the review @neolit123 ! I agree with your summary.

@neolit123 @dlipovetsky Thanks for your review.

But is there such a scenario:
The user first joins a node as a secondary control-plane node successfully.

kubeadm join xxx --token xxx --discovery-token-ca-cert-hash xxx --control-plane --certificate-key xxx

The user then discovers that he intended to join a normal node instead of a control-plane node.
So he deletes the node from the cluster and runs the join command again (forgot to run the reset command):

kubectl delete node xxx kubeadm join xxx --token xxx --discovery-token-ca-cert-hash xxx

Not surprisingly, this command will output errors similar to the following (without DirAvailableCheck):

[preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists [ERROR Port-10250]: Port 10250 is in use [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists

Based on these error messages, the user manually cleaned the relevant files:

rm -rf /etc/kubernetes/kubelet.conf rm -rf /etc/kubernetes/pki/ca.crt systemctl stop kubelet

Then the user can run the join command again, and it will execute successfully.

[preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

At this point, however, the Pod that should be running on the control plane node is running on the normal node by mistake. This is not the desired state of a healthy cluster.

# kubectl get node NAME STATUS ROLES AGE VERSION izhp3acgtbq11k6ldsw593z NotReady control-plane,master 32m v1.22.1 izhp3acgtbq11k6ldsw594z NotReady <none> 114s v1.22.1 # kubectl get pod -A kube-system coredns-7f6cbbb7b8-2l5j9 0/1 Pending 0 32m kube-system coredns-7f6cbbb7b8-sglnx 0/1 Pending 0 32m kube-system etcd-izhp3acgtbq11k6ldsw593z 1/1 Running 2 33m kube-system etcd-izhp3acgtbq11k6ldsw594z 1/1 Running 1 2m9s kube-system kube-apiserver-izhp3acgtbq11k6ldsw593z 1/1 Running 2 33m kube-system kube-apiserver-izhp3acgtbq11k6ldsw594z 1/1 Running 1 2m8s kube-system kube-controller-manager-izhp3acgtbq11k6ldsw593z 1/1 Running 2 (3m18s ago) 33m kube-system kube-controller-manager-izhp3acgtbq11k6ldsw594z 1/1 Running 1 2m8s kube-system kube-proxy-wxxpw 1/1 Running 1 29m kube-system kube-proxy-zc2zj 1/1 Running 0 32m kube-system kube-scheduler-izhp3acgtbq11k6ldsw593z 1/1 Running 4 (3m18s ago) 33m kube-system kube-scheduler-izhp3acgtbq11k6ldsw594z 1/1 Running 1 2m8s

I guess not calling kubeadm reset after the initial control plane node join would mean that the control plane containers would continue running even after the kubectl delete of the node. Reset is sort of mandatory in all cases and If we are still seeing users that do not know about it perhaps it's time to start documenting it on the CLI output?

I think I agree with your assesment that worker join should errors on the presence of kubeadm managed manifests. One potential weird use case that this would break is if someone is joining an etcd node to the cluster - i.e. a worker with an etcd static pod.

Frankly, i would have designed this slightly differently:

error out on the cp manifests only on cp nodes

depending on the kubeadm command init / join / join cp, print a list of what additional files are in the manifest directory with hopes that the user can adjust if something unwanted ended up there.

Q: If we remove DirAvailableCheck would users that were skipping it previously now get an error that there is no such preflight check?

Trying to refresh my memory if we had that.

EDIT: ok looks like kubeadm init --ignore-preflight-errors=foo does not error out that there is no such check called foo. that is good in a way since every time we delete a check the user will not get broken, but at the same time it's not ideal because if the user has a typo in the preflight check they will not get an indication for that...

I think I agree with your assesment that worker join should errors on the presence of kubeadm managed manifests.

+1

One potential weird use case that this would break is if someone is joining an etcd node to the cluster - i.e. a worker with an etcd static pod.

In hindsight, kubeadm could have avoided conflict with user-defined manifests by using a prefix, e.g. kubeadm-etcd.yaml. But even in the absence of such a prefix, I think it's reasonable for kubeadm to treat the manifests, including etcd.yaml as "reserved."

I can follow up with a docs PR to make it clear that kubeadm-managed manifests are "reserved."

@dlipovetsky are you happy with the current state of the diff?
it just removes the directory check, and one can still start a pod called etcd.yaml on a worker without any error / warning.

from your comment above i think i'm reading that join of workers should complain about reserved control-plane pod names as well.

are you happy with the current state of the diff?

Yes.

from your comment above i think i'm reading that join of workers should complain about reserved control-plane pod names as well.

I was thinking out loud. I do think it's a good idea for kubeadm to treat filenames it writes as "reserved." That does more than address this bug, though, and can be a separate PR.

ok, understood.

neolit123 · 2021-09-13T14:48:37Z

kubeadm: init/join preflight check is limited to the static Pod manifest files generated by kubeadm instead of the whole manifests directory, custom static Pod manifest files will be ignored in the preflight check now

we should change the release note depending on #104942 (comment)

e.g.

kubeadm: do no check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight

SataQiu · 2021-09-14T03:09:34Z

@SataQiu thanks for this PR!
Q: as per kubernetes/kubeadm#2564 (comment), should we also make this checks consistent between init and join?

@fabriziopandini

Yes, the checks in the init phase are defined here:

kubernetes/cmd/kubeadm/app/preflight/checks.go

Lines 902 to 930 in 047a6b9

    
           // RunInitNodeChecks executes all individual, applicable to control-plane node checks. 
        
           // The boolean flag 'isSecondaryControlPlane' controls whether we are running checks in a --join-control-plane scenario. 
        
           // The boolean flag 'downloadCerts' controls whether we should skip checks on certificates because we are downloading them. 
        
           // If the flag is set to true we should skip checks already executed by RunJoinNodeChecks. 
        
           func RunInitNodeChecks(execer utilsexec.Interface, cfg *kubeadmapi.InitConfiguration, ignorePreflightErrors sets.String, isSecondaryControlPlane bool, downloadCerts bool) error { 
        
           	if !isSecondaryControlPlane { 
        
           		// First, check if we're root separately from the other preflight checks and fail fast 
        
           		if err := RunRootCheckOnly(ignorePreflightErrors); err != nil { 
        
           			return err 
        
           		} 
        
           	} 
        
           	manifestsDir := filepath.Join(kubeadmconstants.KubernetesDir, kubeadmconstants.ManifestsSubDirName) 
        
           	checks := []Checker{ 
        
           		NumCPUCheck{NumCPU: kubeadmconstants.ControlPlaneNumCPU}, 
        
           		// Linux only 
        
           		// TODO: support other OS, if control-plane is supported on it. 
        
           		MemCheck{Mem: kubeadmconstants.ControlPlaneMem}, 
        
           		KubernetesVersionCheck{KubernetesVersion: cfg.KubernetesVersion, KubeadmVersion: kubeadmversion.Get().GitVersion}, 
        
           		FirewalldCheck{ports: []int{int(cfg.LocalAPIEndpoint.BindPort), kubeadmconstants.KubeletPort}}, 
        
           		PortOpenCheck{port: int(cfg.LocalAPIEndpoint.BindPort)}, 
        
           		PortOpenCheck{port: kubeadmconstants.KubeSchedulerPort}, 
        
           		PortOpenCheck{port: kubeadmconstants.KubeControllerManagerPort}, 
        
           		FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeAPIServer, manifestsDir)}, 
        
           		FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeControllerManager, manifestsDir)}, 
        
           		FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeScheduler, manifestsDir)}, 
        
           		FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.Etcd, manifestsDir)}, 
        
           		HTTPProxyCheck{Proto: "https", Host: cfg.LocalAPIEndpoint.AdvertiseAddress}, 
        
           	}

In the init preflight check, we only check the specific manifest files, not the whole /etc/kubernetes/manifests folder.

…y on joining worker nodes during preflight Signed-off-by: SataQiu <shidaqiu2018@gmail.com>

dlipovetsky · 2021-09-15T16:29:25Z

Thanks @SataQiu for thinking about the details and fixing the issue!

neolit123

/lgtm
/triage accepted

k8s-ci-robot requested review from RA489 and yagonobre September 12, 2021 14:56

pacoxu reviewed Sep 13, 2021

View reviewed changes

k8s-ci-robot requested a review from neolit123 September 13, 2021 01:52

k8s-ci-robot assigned fabriziopandini Sep 13, 2021

fabriziopandini reviewed Sep 13, 2021

View reviewed changes

neolit123 reviewed Sep 13, 2021

View reviewed changes

SataQiu changed the title ~~kubeadm: do no check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight~~ kubeadm: do not check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight Sep 14, 2021

kubeadm: do not check if the /etc/kubernetes/manifests folder is empt…

d57e442

…y on joining worker nodes during preflight Signed-off-by: SataQiu <shidaqiu2018@gmail.com>

SataQiu force-pushed the kubeadm-20210912 branch from 271fbb6 to d57e442 Compare September 14, 2021 12:26

neolit123 reviewed Sep 15, 2021

View reviewed changes

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 15, 2021

k8s-ci-robot assigned neolit123 Sep 15, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 15, 2021

k8s-ci-robot merged commit 8975906 into kubernetes:master Sep 15, 2021

k8s-ci-robot added this to the v1.23 milestone Sep 15, 2021

cprivitere mentioned this pull request Mar 17, 2022

Clean up scripting, fix awk, add ip routes for IBX datacenters pluralsh/cluster-api-provider-packet#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm: do not check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight #104942

kubeadm: do not check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight #104942

SataQiu commented Sep 12, 2021 •

edited

k8s-ci-robot commented Sep 12, 2021

pacoxu left a comment

pacoxu commented Sep 13, 2021

fabriziopandini left a comment

neolit123 Sep 13, 2021 •

edited

neolit123 Sep 13, 2021

dlipovetsky Sep 13, 2021

SataQiu Sep 14, 2021

neolit123 Sep 14, 2021

neolit123 Sep 14, 2021 •

edited

dlipovetsky Sep 14, 2021 •

edited

neolit123 Sep 14, 2021 •

edited

dlipovetsky Sep 15, 2021

neolit123 Sep 15, 2021

neolit123 commented Sep 13, 2021

SataQiu commented Sep 14, 2021

dlipovetsky commented Sep 15, 2021

neolit123 left a comment

	if j.Cfg().ControlPlane != nil {
	// Checks if the cluster configuration supports
	// joining a new control plane instance and if all the necessary certificates are provided
	hasCertificateKey := len(j.CertificateKey()) > 0
	if err := checkIfReadyForAdditionalControlPlane(&initCfg.ClusterConfiguration, hasCertificateKey); err != nil {
	// outputs the not ready for hosting a new control plane instance message
	ctx := map[string]string{
	"Error": err.Error(),
	}

	var msg bytes.Buffer
	notReadyToJoinControlPlaneTemp.Execute(&msg, ctx)
	return errors.New(msg.String())
	}

	// run kubeadm init preflight checks for checking all the prerequisites
	fmt.Println("[preflight] Running pre-flight checks before initializing the new control plane instance")

	if err := preflight.RunInitNodeChecks(utilsexec.New(), initCfg, j.IgnorePreflightErrors(), true, hasCertificateKey); err != nil {
	return err

kubeadm: do not check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight #104942

kubeadm: do not check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight #104942

Conversation

SataQiu commented Sep 12, 2021 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Sep 12, 2021

pacoxu left a comment

Choose a reason for hiding this comment

pacoxu commented Sep 13, 2021

fabriziopandini left a comment

Choose a reason for hiding this comment

neolit123 Sep 13, 2021 • edited

Choose a reason for hiding this comment

neolit123 Sep 13, 2021

Choose a reason for hiding this comment

dlipovetsky Sep 13, 2021

Choose a reason for hiding this comment

SataQiu Sep 14, 2021

Choose a reason for hiding this comment

neolit123 Sep 14, 2021

Choose a reason for hiding this comment

neolit123 Sep 14, 2021 • edited

Choose a reason for hiding this comment

dlipovetsky Sep 14, 2021 • edited

Choose a reason for hiding this comment

neolit123 Sep 14, 2021 • edited

Choose a reason for hiding this comment

dlipovetsky Sep 15, 2021

Choose a reason for hiding this comment

neolit123 Sep 15, 2021

Choose a reason for hiding this comment

neolit123 commented Sep 13, 2021

SataQiu commented Sep 14, 2021

dlipovetsky commented Sep 15, 2021

neolit123 left a comment

Choose a reason for hiding this comment

SataQiu commented Sep 12, 2021 •

edited

neolit123 Sep 13, 2021 •

edited

neolit123 Sep 14, 2021 •

edited

dlipovetsky Sep 14, 2021 •

edited

neolit123 Sep 14, 2021 •

edited