Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover CSI volumes from dangling attachments #96617

Merged
merged 1 commit into from Dec 16, 2020

Conversation

yuga711
Copy link
Contributor

@yuga711 yuga711 commented Nov 16, 2020

Change-Id: I72105d67d8a4069ab19bfa4638a7ac365cf4194c

What type of PR is this?

/kind bug

What this PR does / why we need it:
This PR recovers from certain type of dangling CSI attachments by detaching them from the undesired nodes. In the current design, if kube-controller-manager crashes/restarts while attach/detach is in-progress, the volume may be left attached to an undesired node (eg., when the pod using the volume is deleted during the restart window). This PR fixes such dangling attachments by using the presence of VolumeAttachment objects during Attach/Detach Controller initialization.

This solution relies on the existence of the VA object. While this solution would recover dangling volumes caused because of certain races, it will not recover from other dangling scenarios where VA object itself is gone

Which issue(s) this PR fixes:
Partially addresses #94912
Partially addresses #80488
Fixes #77324

Special notes for your reviewer:
Verified the fix through reproduce steps in #94912

Does this PR introduce a user-facing change?:
NONE

Fix to recover CSI volumes from certain dangling attachments

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 16, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @yuga711. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 16, 2020
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 16, 2020
@yuga711
Copy link
Contributor Author

yuga711 commented Nov 16, 2020

/assign @msau42

err = adc.actualStateOfWorld.MarkVolumeAsUncertain(volumeName, volumeSpec, nodeName)
if err != nil {
klog.Errorf("MarkVolumeAsUncertain fail to add the volume %q (%q) to ASW. err: %s", volumeName, nodeName, err)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit: is this continue needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@gnufied
Copy link
Member

gnufied commented Nov 16, 2020

/assign

@msau42
Copy link
Member

msau42 commented Nov 16, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 16, 2020
@msau42
Copy link
Member

msau42 commented Nov 16, 2020

/assign @jingxu97

@gnufied
Copy link
Member

gnufied commented Nov 16, 2020

Have we considered if we can use following error from CSI:

| Volume published to another node | 9 FAILED_PRECONDITION | Indicates that a volume corresponding to the specified volume_id has already been published at another node and does not have MULTI_NODE volume capability. If this error code is returned, the Plugin SHOULD specify the node_id of the node at which the volume is published as part of the gRPC status.message. | Caller SHOULD ensure the specified volume is not published at any other node before retrying with exponential back off. |

and use it to return dangling volume errors so as all plugins that implement ControllerPublishVolume correctly can benefit. Using VA to track attachment status is okay but not 100% fullproof.

@msau42
Copy link
Member

msau42 commented Nov 16, 2020

@kubernetes/sig-storage-pr-reviews

I think we'll need to look at both solutions. Relying on attach failure code means that the pod had to be rescheduled.

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Nov 16, 2020
@yuga711
Copy link
Contributor Author

yuga711 commented Nov 16, 2020

@kubernetes/sig-storage-pr-reviews

I think we'll need to look at both solutions. Relying on attach failure code means that the pod had to be rescheduled.

Yes, that's right. We need both solutions as they have their own limitations. Attach failure code approach will not fix until next attach and may not fix in multi-attach case.

@k8s-ci-robot
Copy link
Contributor

@yuga711: Reiterating the mentions to trigger a notification:
@kubernetes/sig-storage-pr-reviews

In response to this:

@kubernetes/sig-storage-pr-reviews

I think we'll need to look at both solutions. Relying on attach failure code means that the pod had to be rescheduled.

Yes, that's right. We need both solutions as they have their own limitations. Attach failure code approach will not fix until next attach and may not fix in multi-attach case.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jingxu97
Copy link
Contributor

Have we considered if we can use following error from CSI:

| Volume published to another node | 9 FAILED_PRECONDITION | Indicates that a volume corresponding to the specified volume_id has already been published at another node and does not have MULTI_NODE volume capability. If this error code is returned, the Plugin SHOULD specify the node_id of the node at which the volume is published as part of the gRPC status.message. | Caller SHOULD ensure the specified volume is not published at any other node before retrying with exponential back off. |

and use it to return dangling volume errors so as all plugins that implement ControllerPublishVolume correctly can benefit. Using VA to track attachment status is okay but not 100% fullproof.

Here I think the code does not rely on VA attachment status. As long as there is a VA, no matter the status is attached or not, the code will add volume as uncertain status to make sure following attach or detach will be triffered.

@yuga711
Copy link
Contributor Author

yuga711 commented Dec 10, 2020

The tests seems to be failing because a function name (IsVolumeAttachedToNode) was replaced after this PR was posted. I will need to rework a bit and will repost the change.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@msau42
Copy link
Member

msau42 commented Dec 10, 2020

/lgtm cancel

to fix merge conflict

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 10, 2020
Change-Id: I72105d67d8a4069ab19bfa4638a7ac365cf4194c
@msau42
Copy link
Member

msau42 commented Dec 16, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 16, 2020
@k8s-ci-robot k8s-ci-robot merged commit efb9489 into kubernetes:master Dec 16, 2020
@jingxu97
Copy link
Contributor

consider to cherrypick this PR?

@msau42
Copy link
Member

msau42 commented Jan 13, 2021

I think that's a good idea. @gnufied any concerns?

@gnufied
Copy link
Member

gnufied commented Jan 20, 2021

@msau42 should be fine to cherry-pick.

k8s-ci-robot added a commit that referenced this pull request Feb 4, 2021
…17-upstream-release-1.18

Automated cherry pick of #94599: Fixes Attach Detach Controller reconciler race reading #96617: Recover CSI volumes from dangling attachments
k8s-ci-robot added a commit that referenced this pull request Feb 4, 2021
…17-upstream-release-1.20

Automated cherry pick of #94599: Fixes Attach Detach Controller reconciler race reading #96617: Recover CSI volumes from dangling attachments
k8s-ci-robot added a commit that referenced this pull request Feb 4, 2021
…17-upstream-release-1.19

Automated cherry pick of #94599: Fixes Attach Detach Controller reconciler race reading #96617: Recover CSI volumes from dangling attachments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Garbage collect VolumeAttachment objects to trigger detach
9 participants