New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic ephemeral volume enablement #99446
Generic ephemeral volume enablement #99446
Conversation
/sig storage |
6d0be0e
to
ff8d553
Compare
/retest |
/retest |
/assign |
ff8d553
to
fb88540
Compare
/priority important-longterm |
/ok-to-test |
@troy0820: this is needed for a feature which is meant to go beta in 1.21. Can we bump the priority a bit? "longterm" might be too late 😅 |
/priority important-soon |
/retest |
} | ||
|
||
func testUpdatePod(t *testing.T, oldPod *api.Pod, labelValue string) *api.Pod { | ||
updatedPod := oldPod.DeepCopyObject().(*api.Pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use oldObject.DeepCopy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that's much simpler. Just didn't occur to me. Fixed.
expectedPod := pod.DeepCopyObject().(*api.Pod) | ||
|
||
Strategy.PrepareForCreate(context.Background(), pod) | ||
require.Equal(t, expectedPod.Spec, pod.Spec, "pod spec") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests that check if feature gate is disabled then then field is dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I added one.
"DSW PodExistsInVolume returned incorrect value. Expected: <false> Actual: <%v>", | ||
podExistsInVolume) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is bit hard to follow. Should we have used table style tests here?
Also for last condition that we are checking, is there something special about ephemeral volumes? Seems like that is just how cleanup from DSOW behaves. Why do we have to unit test it separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I derived this from the other, already existing tests (for example, TestFindAndRemoveDeletedPodsWithActualState
) and just added the feature enable/disable steps and generic ephemeral volume type instead of a persistent volume. That last check probably could be removed, but I'd prefer to keep it, just to be on the safe side and verify that using an ephemeral volume really doesn't change something compared to the persistent volume.
I don't think that a table driven approach would work better because some steps depend on the previous ones.
The implementation should have preserved an existing ephemeral volume source during an update even when the feature gate is currently disabled, but due to a cut-and-paste error it was checking for CSI volumes instead. The new test detected that. It's based on https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3
Silently ignoring the unsupported volume type leads to: Warning FailedMount 8s kubelet Unable to attach or mount volumes: unmounted volumes=[my-csi-volume default-token-bsnbz], unattached volumes=[my-csi-volume default-token-bsnbz]: failed to get Plugin from volumeSpec for volume "my-csi-volume" err=no volume plugin matched The new message is easier to understand: Warning FailedMount 6s (x5 over 49s) kubelet Unable to attach or mount volumes: unmounted volumes=[my-csi-volume], unattached volumes=[my-csi-volume default-token-rwlpp]: volume my-csi-volume is a generic ephemeral volume, but that feature is disabled in kubelet
This simulates various error scenarios (PVC not created for pod, feature disabled) and switching between feature disabled and enabled.
Without this error, kube-scheduler was simply ignoring the special volume source and scheduled the pod. This was unlikely to work in practice because the volume might have needed binding or the feature is also disabled on kubelet which then doesn't know what to do with the volume.
This covers some failure scenarios and feature gate enablement.
fb88540
to
512401a
Compare
@gnufied I've pushed the update, please take another look. |
@pohly: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
/lgtm |
/assign @smarterclayton |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This adds further tests that ensure that the
GenericEphemeralVolume
feature works as intended.Handling of the new volume type was slightly broken in the kube-apiserver and didn't preserve existing instances of the type when storing a pod update.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: