Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add CSIVolumeHealth feature and gate #99284

Merged

Conversation

fengzixu
Copy link
Contributor

@fengzixu fengzixu commented Feb 21, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

why we need it

According to the change of kep, we decide to move the responsibility (checking volume condition and reporting abnormal volume event to pod ) from external-health-monitor-agent to kubelet.

what this PR does

  1. Modify GetMetrics function to return the volumeStats objet which includes the volumeCondition info
  2. Sending event to the corresponding pod which is using an abnormal volume
  3. Make compatibility change on functions which called GetMetrics function

Which issue(s) this PR fixes:

Not yet.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Enables Kubelet to check volume condition and log events to corresponding pods.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/pull/2286

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 21, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @fengzixu. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 21, 2021
@fengzixu
Copy link
Contributor Author

/assign @xing-yang @NickrenREN @fengzixu

@NickrenREN
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 22, 2021
@xing-yang
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 22, 2021
@fengzixu fengzixu force-pushed the support-external-health-monitor branch from e862e69 to bd8748c Compare February 22, 2021 14:54
@ehashman ehashman added this to Triage in SIG Node PR Triage Feb 22, 2021
@fengzixu
Copy link
Contributor Author

/test pull-kubernetes-e2e-kind

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 9, 2021
@fengzixu
Copy link
Contributor Author

fengzixu commented Mar 9, 2021

This was hard to review for me because there is so much noise in the diff due to adding the EventRecorder and renaming stat to metrics. It is pretty difficult to extract the actual KEP related functional change from this PR.

Please break this out into 2-3 commits which isolates the actual KEP work from the non-functional changes.

Maybe:

  • rename stat to metrics
  • add EventRecorder to ResourceAnalyzer
  • add CSIVolumeHealth feature

Just thinking about the feature versions of ourselves and having a useful change history

Thanks for your suggestion. I split this PR into 2 commits. Because add CSIVolumeHealth feature needs eventRecorder support.

@fengzixu
Copy link
Contributor Author

fengzixu commented Mar 9, 2021

@sjenning You can check it now

@fengzixu fengzixu force-pushed the support-external-health-monitor branch from 866f5d3 to f87d229 Compare March 9, 2021 15:53
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 9, 2021
@@ -74,38 +86,53 @@ func (c *fakeCsiDriverClient) NodeGetInfo(ctx context.Context) (
}

func (c *fakeCsiDriverClient) NodeGetVolumeStats(ctx context.Context, volID string, targetPath string) (
usageCountMap *volume.Metrics, err error) {
usageCountMap *volume.Stats, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's still renaming in the first commit

@fengzixu fengzixu force-pushed the support-external-health-monitor branch from f87d229 to fdc348a Compare March 9, 2021 16:14
1. add EventRecorder to ResourceAnalyzer
2. add CSIVolumeHealth feature and gate
@fengzixu fengzixu force-pushed the support-external-health-monitor branch from fdc348a to edc1c62 Compare March 9, 2021 16:17
@sjenning
Copy link
Contributor

sjenning commented Mar 9, 2021

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fengzixu, msau42, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2021
@msau42
Copy link
Member

msau42 commented Mar 9, 2021

/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 9, 2021
@xing-yang
Copy link
Contributor

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 9, 2021
@xing-yang
Copy link
Contributor

/test pull-kubernetes-integration

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 9, 2021

@fengzixu: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-azure-file-windows-containerd fd4d3b1cf1cbd9f1dbf6a14705d25af34ad4ec80 link /test pull-kubernetes-e2e-azure-file-windows-containerd

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@xing-yang
Copy link
Contributor

/test pull-kubernetes-integration

@xing-yang
Copy link
Contributor

Test failure is not related to this change.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2021
@xing-yang
Copy link
Contributor

@fengzixu Please rename the PR to: feature: add CSIVolumeHealth feature and gate

@fengzixu fengzixu changed the title feature: sending corresponding event of abnormal volume in kubelet feature: add CSIVolumeHealth feature and gate Mar 9, 2021
@k8s-ci-robot k8s-ci-robot merged commit dcb3c56 into kubernetes:master Mar 10, 2021
SIG Node PR Triage automation moved this from Needs Reviewer to Done Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet