Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic ephemeral volume enablement #99446

Merged
merged 5 commits into from Mar 4, 2021

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Feb 25, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

This adds further tests that ensure that the GenericEphemeralVolume feature works as intended.

Handling of the new volume type was slightly broken in the kube-apiserver and didn't preserve existing instances of the type when storing a pod update.

Does this PR introduce a user-facing change?

kube-apiserver: an update of a pod with a generic ephemeral volume dropped that volume if the feature had been disabled since creating the pod with such a volume

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1698-generic-ephemeral-volumes/README.md

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 25, 2021
@pohly
Copy link
Contributor Author

pohly commented Feb 25, 2021

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. area/kubelet sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 25, 2021
@pohly pohly force-pushed the generic-ephemeral-enablement branch from 6d0be0e to ff8d553 Compare February 25, 2021 10:36
@pohly
Copy link
Contributor Author

pohly commented Feb 25, 2021

/retest

@ehashman ehashman added this to Triage in SIG Node PR Triage Feb 25, 2021
@pohly
Copy link
Contributor Author

pohly commented Feb 26, 2021

/retest

@gnufied
Copy link
Member

gnufied commented Mar 1, 2021

/assign

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 1, 2021
@pohly pohly force-pushed the generic-ephemeral-enablement branch from ff8d553 to fb88540 Compare March 1, 2021 18:22
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 1, 2021
@troy0820
Copy link
Member

troy0820 commented Mar 1, 2021

/priority important-longterm
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 1, 2021
@troy0820
Copy link
Member

troy0820 commented Mar 1, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Mar 1, 2021
@troy0820 troy0820 moved this from Triage to Needs Reviewer in SIG Node PR Triage Mar 1, 2021
@pohly
Copy link
Contributor Author

pohly commented Mar 1, 2021

@troy0820: this is needed for a feature which is meant to go beta in 1.21. Can we bump the priority a bit? "longterm" might be too late 😅

@troy0820
Copy link
Member

troy0820 commented Mar 1, 2021

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 1, 2021
@troy0820
Copy link
Member

troy0820 commented Mar 1, 2021

/retest

}

func testUpdatePod(t *testing.T, oldPod *api.Pod, labelValue string) *api.Pod {
updatedPod := oldPod.DeepCopyObject().(*api.Pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use oldObject.DeepCopy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that's much simpler. Just didn't occur to me. Fixed.

expectedPod := pod.DeepCopyObject().(*api.Pod)

Strategy.PrepareForCreate(context.Background(), pod)
require.Equal(t, expectedPod.Spec, pod.Spec, "pod spec")
Copy link
Member

@gnufied gnufied Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests that check if feature gate is disabled then then field is dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I added one.

"DSW PodExistsInVolume returned incorrect value. Expected: <false> Actual: <%v>",
podExistsInVolume)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is bit hard to follow. Should we have used table style tests here?

Also for last condition that we are checking, is there something special about ephemeral volumes? Seems like that is just how cleanup from DSOW behaves. Why do we have to unit test it separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I derived this from the other, already existing tests (for example, TestFindAndRemoveDeletedPodsWithActualState) and just added the feature enable/disable steps and generic ephemeral volume type instead of a persistent volume. That last check probably could be removed, but I'd prefer to keep it, just to be on the safe side and verify that using an ephemeral volume really doesn't change something compared to the persistent volume.

I don't think that a table driven approach would work better because some steps depend on the previous ones.

The implementation should have preserved an existing ephemeral volume
source during an update even when the feature gate is currently
disabled, but due to a cut-and-paste error it was checking for CSI
volumes instead.

The new test detected that. It's based on
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3
Silently ignoring the unsupported volume type leads to:

  Warning  FailedMount       8s    kubelet            Unable to attach or mount volumes: unmounted volumes=[my-csi-volume default-token-bsnbz], unattached volumes=[my-csi-volume default-token-bsnbz]: failed to get Plugin from volumeSpec for volume "my-csi-volume" err=no volume plugin matched

The new message is easier to understand:
  Warning  FailedMount       6s (x5 over 49s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[my-csi-volume], unattached volumes=[my-csi-volume default-token-rwlpp]: volume my-csi-volume is a generic ephemeral volume, but that feature is disabled in kubelet
This simulates various error scenarios (PVC not created for pod,
feature disabled) and switching between feature disabled and enabled.
Without this error, kube-scheduler was simply ignoring the special
volume source and scheduled the pod. This was unlikely to work in
practice because the volume might have needed binding or the feature
is also disabled on kubelet which then doesn't know what to do with
the volume.
This covers some failure scenarios and feature gate enablement.
@pohly pohly force-pushed the generic-ephemeral-enablement branch from fb88540 to 512401a Compare March 3, 2021 09:13
@pohly
Copy link
Contributor Author

pohly commented Mar 3, 2021

@gnufied I've pushed the update, please take another look.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 3, 2021

@pohly: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-bazel-test fb8854082c86a08c8fce6173779401b4a55ee1a0 link /test pull-kubernetes-bazel-test
pull-kubernetes-bazel-build fb8854082c86a08c8fce6173779401b4a55ee1a0 link /test pull-kubernetes-bazel-build

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pohly
Copy link
Contributor Author

pohly commented Mar 3, 2021

/retest

@gnufied
Copy link
Member

gnufied commented Mar 3, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2021
@pohly
Copy link
Contributor Author

pohly commented Mar 3, 2021

/assign @smarterclayton

@SergeyKanzhelev SergeyKanzhelev moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Mar 3, 2021
@smarterclayton
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2021
@derekwaynecarr derekwaynecarr moved this from Needs Approver to Done in SIG Node PR Triage Mar 4, 2021
@k8s-ci-robot k8s-ci-robot merged commit a238698 into kubernetes:master Mar 4, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants