Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert cmd/kubelet/app/server.go to structured logging #98334

Merged
merged 1 commit into from Mar 17, 2021

Conversation

wawa0210
Copy link
Contributor

@wawa0210 wawa0210 commented Jan 24, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

When the parameters are incorrect, kubelet only outputs the error log instead of the entire stack log
This way the user experience will be better and consistent with other components (such as kube-proxy)

Which issue(s) this PR fixes:

Fixes #98292

Special notes for your reviewer:
For compatibility, refer to klog.Fatalf Exit code remains 255

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/sig node
/area kubelet

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 24, 2021
@wawa0210
Copy link
Contributor Author

/assign @vishh

@ehashman ehashman added this to Needs Reviewer in SIG Node PR Triage Jan 25, 2021
@ehashman
Copy link
Member

/hold

Bug wasn't accepted and I'm not sure this is an improved user experience.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 25, 2021
@ehashman
Copy link
Member

Note: I think Fatal logs are going away anyways with the structured logging migration this release, so this PR will be replaced soon anyways. See: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#replacing-fatal-calls

It would be better to migrate this entire file to use structured logging rather than just making this change.

@ehashman ehashman moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Jan 25, 2021
@wawa0210
Copy link
Contributor Author

It would be better to migrate this entire file to use structured logging rather than just making this change.

Understand, if possible, I want this pr to fix this bug alone (keep focus). Then reopen a pr to migrate the entire file to use structured logs. What do you think?

@ehashman
Copy link
Member

It would be better to migrate this entire file to use structured logging rather than just making this change.

Understand, if possible, I want this pr to fix this bug alone (keep focus). Then reopen a pr to migrate the entire file to use structured logs. What do you think?

I would prefer you fix it all in one go, because migrating away from using Fatalf is part of the log migration, and we have limited reviewer/approver bandwidth.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 29, 2021
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! I have a bit of feedback on the log changes.

@@ -117,7 +117,8 @@ func NewKubeletCommand() *cobra.Command {
kubeletConfig, err := options.NewKubeletConfiguration()
// programmer error
if err != nil {
klog.Fatal(err)
klog.ErrorS(err, "Failed create a new kubelet configuration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.ErrorS(err, "Failed create a new kubelet configuration")
klog.ErrorS(err, "Failed to create a new kubelet configuration")

@@ -117,7 +117,8 @@ func NewKubeletCommand() *cobra.Command {
kubeletConfig, err := options.NewKubeletConfiguration()
// programmer error
if err != nil {
klog.Fatal(err)
klog.ErrorS(err, "Failed create a new kubelet configuration")
os.Exit(255)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use exit code 1 throughout this file, and not 255, to be consistent with the rest of the files in cmd/: https://cs.k8s.io/?q=os.Exit&i=nope&files=cmd%2F.*&repos=kubernetes/kubernetes

Copy link
Contributor Author

@wawa0210 wawa0210 Jan 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the error code of klog.Fatalf will return 255, in order not to break compatibility, I returned 255.
So I kept the status quo. Does it need to be adjusted to 1 here? Does it need to be discussed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ask in Slack about it.

Copy link
Member

@neolit123 neolit123 Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not be using 255, but rather 1.
also idealy applications should have a single point of os.Exit() instead of multiple ones.

but as mentioned on slack this change would need an ACTION REQUIRED if users are expecting an exact value and not != 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also idealy applications should have a single point of os.Exit() instead of multiple ones.

There are various reasons for calling os.Exit(1). If only one os.Exit() is needed, do you have any good suggestions?

Copy link
Member

@neolit123 neolit123 Feb 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deferring to the kubelet maintainers, but ideally errors should bubble up to a single point of os.Exit(x).
and there, depending on the different error types different exit codes can be returned.

kubernetes components tend to just os.Exit at arbitrary locations with fixed error codes, which is not a great pattern.

}

// check if there are non-flag arguments in the command line
cmds := cleanFlagSet.Args()
if len(cmds) > 0 {
cmd.Usage()
klog.Fatalf("unknown command: %s", cmds[0])
klog.InfoS("Unknown command", "command", cmds[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -534,7 +548,7 @@ func run(ctx context.Context, s *options.KubeletServer, kubeDeps *kubelet.Depend
return err
}
if cloud != nil {
klog.V(2).Infof("Successfully initialized cloud provider: %q from the config file: %q\n", s.CloudProvider, s.CloudConfigFile)
klog.V(2).InfoS("Successfully initialized cloud provider", "cloud provider", s.CloudProvider, "config file", s.CloudConfigFile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keys should not have spaces. I think the convention is to use camelCase (i.e. "cloudProvider")

@@ -611,14 +625,14 @@ func run(ctx context.Context, s *options.KubeletServer, kubeDeps *kubelet.Depend
cgroupRoots = append(cgroupRoots, nodeAllocatableRoot)
kubeletCgroup, err := cm.GetKubeletContainer(s.KubeletCgroups)
if err != nil {
klog.Warningf("failed to get the kubelet's cgroup: %v. Kubelet system container metrics may be missing.", err)
klog.InfoS("Failed to get the kubelet's cgroup. Kubelet system container metrics may be missing.", "error", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -677,15 +691,15 @@ func run(ctx context.Context, s *options.KubeletServer, kubeDeps *kubelet.Depend

if reservedSystemCPUs.Size() > 0 {
// at cmd option valication phase it is tested either --system-reserved-cgroup or --kube-reserved-cgroup is specified, so overwrite should be ok
klog.Infof("Option --reserved-cpus is specified, it will overwrite the cpu setting in KubeReserved=\"%v\", SystemReserved=\"%v\".", s.KubeReserved, s.SystemReserved)
klog.InfoS("Option --reserved-cpus is specified, it will overwrite the cpu setting in KubeReserved=\"%v\", SystemReserved=\"%v\".", s.KubeReserved, s.SystemReserved)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line needs to be updated, it still has string formatting

if s.KubeReserved != nil {
delete(s.KubeReserved, "cpu")
}
if s.SystemReserved == nil {
s.SystemReserved = make(map[string]string)
}
s.SystemReserved["cpu"] = strconv.Itoa(reservedSystemCPUs.Size())
klog.Infof("After cpu setting is overwritten, KubeReserved=\"%v\", SystemReserved=\"%v\"", s.KubeReserved, s.SystemReserved)
klog.InfoS("After cpu setting is overwritten, KubeReserved=\"%v\", SystemReserved=\"%v\"", s.KubeReserved, s.SystemReserved)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs its string formatting updated

@@ -791,7 +805,7 @@ func run(ctx context.Context, s *options.KubeletServer, kubeDeps *kubelet.Depend
go wait.Until(func() {
err := http.ListenAndServe(net.JoinHostPort(s.HealthzBindAddress, strconv.Itoa(int(s.HealthzPort))), mux)
if err != nil {
klog.Errorf("Starting healthz server failed: %v", err)
klog.ErrorS(err, "Failed to starting healthz server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.ErrorS(err, "Failed to starting healthz server")
klog.ErrorS(err, "Failed to start healthz server")

@@ -1093,7 +1107,7 @@ func RunKubelet(kubeServer *options.KubeletServer, kubeDeps *kubelet.Dependencie
for _, ip := range strings.Split(kubeServer.NodeIP, ",") {
parsedNodeIP := net.ParseIP(strings.TrimSpace(ip))
if parsedNodeIP == nil {
klog.Warningf("Could not parse --node-ip value %q; ignoring", ip)
klog.InfoS("Could not parse --node-ip ignoring", "nodeIp", ip)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.InfoS("Could not parse --node-ip ignoring", "nodeIp", ip)
klog.InfoS("Could not parse --node-ip ignoring", "nodeIP", ip)

@@ -1114,7 +1128,7 @@ func RunKubelet(kubeServer *options.KubeletServer, kubeDeps *kubelet.Dependencie
})

credentialprovider.SetPreferredDockercfgPath(kubeServer.RootDirectory)
klog.V(2).Infof("Using root directory: %v", kubeServer.RootDirectory)
klog.V(2).InfoS("Using root directory", "directory", kubeServer.RootDirectory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serathius do you have an opinion on directory vs. dir?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"path" :P

@ehashman
Copy link
Member

/priority important-longterm
/triage accepted
/retitle Convert cmd/kubelet/app/server.go to structured logging

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 29, 2021
@k8s-ci-robot k8s-ci-robot changed the title Fix kubelet flag verification failed, output error message instead of stack error Convert cmd/kubelet/app/server.go to structured logging Jan 29, 2021
@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 29, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 11, 2021
@ehashman ehashman moved this from Waiting on Author to Needs Approver in SIG Node PR Triage Mar 11, 2021
@ehashman ehashman moved this from Waiting on Author to Needs Approver in Structured Logging Migration for Kubelet, 1.21 Mar 11, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2021
@ehashman
Copy link
Member

/remove-kind bug
/kind cleanup

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 11, 2021
@mrunalp
Copy link
Contributor

mrunalp commented Mar 16, 2021

/approve

@mrunalp mrunalp moved this from Needs Approver to Done in SIG Node PR Triage Mar 16, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrunalp, wawa0210

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2021
@ehashman ehashman moved this from Needs Approver to Done in Structured Logging Migration for Kubelet, 1.21 Mar 16, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2021
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2021
@pacoxu
Copy link
Member

pacoxu commented Mar 17, 2021

/retest

@pacoxu
Copy link
Member

pacoxu commented Mar 17, 2021

/remove-label needs-rebase

@k8s-ci-robot
Copy link
Contributor

@pacoxu: The label(s) /remove-label needs-rebase cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash

In response to this:

/remove-label needs-rebase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2021
@pacoxu
Copy link
Member

pacoxu commented Mar 17, 2021

/retest

@wawa0210
Copy link
Contributor Author

/test pull-kubernetes-e2e-kind-ipv6

@k8s-ci-robot k8s-ci-robot merged commit 1dce898 into kubernetes:master Mar 17, 2021
@wawa0210 wawa0210 deleted the fix-98292 branch March 17, 2021 11:20
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

In the kubelet flags validity check, is it reasonable to use klog.Fatal?
9 participants