New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preparation for Topology migration to GA for CSI migration #97823
Conversation
@Jiawei0227: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @msau42 , thanks for your review! Sorry I have to close the old one and open this new one.
I think to make it cleaner, I am focusing this PR on the csi-translation-lib fix first and I also include some preparation work for topology label GA promotion. I will work on the actual label migration for gcepd as soon as this PR merges.
I have copied your comment and replied below. Please take a look. Thanks!
@@ -286,7 +291,10 @@ func (g *gcePersistentDiskCSITranslator) TranslateCSIPVToInTree(pv *v1.Persisten | |||
gceSource.Partition = int32(partInt) | |||
} | |||
|
|||
// TODO: Take the zone/regional information and stick it into the label. | |||
// translate CSI topology to In-tree topology for rollback compatibility | |||
if err := translateTopologyFromCSIToInTree(pv, GCEPDTopologyKey); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not just removing the old key. We need to add in the kubernetes topology keys.
The PV label admission controller is going to be removed in the future, so we cannot depend on the admission controller adding it for us.
Done.
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
@@ -135,7 +138,7 @@ func SelectZonesForVolume(zoneParameterPresent, zonesParameterPresent bool, zone | |||
} | |||
|
|||
if (len(allowedTopologies) > 0) && (allowedZones.Len() == 0) { | |||
return nil, fmt.Errorf("no matchLabelExpressions with %s key found in allowedTopologies. Please specify matchLabelExpressions with %s key", v1.LabelFailureDomainBetaZone, v1.LabelFailureDomainBetaZone) | |||
return nil, fmt.Errorf("no matchLabelExpressions with %s key found in allowedTopologies. Please specify matchLabelExpressions with %s key", v1.LabelTopologyZone, v1.LabelTopologyZone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this logic shared by other clouds too? Will this cause other clouds' Provision() to use GA labels?
This function SelectZoneForVolume
is shared by all in-tree plugins and is being called in in-tree code. This function only picks a zone value from the labels. So it does not matter if the value comes from GA or beta label. For example:
If a node has failure-domain.beta.kubernetes.io/zone=us-west-2
and topology.kubernetes.io/zone=us-west-2
. This function will just return the us-west-2
so it should not matter here.
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
plugin/pkg/admission/storage/persistentvolume/label/admission.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
@@ -286,7 +291,10 @@ func (g *gcePersistentDiskCSITranslator) TranslateCSIPVToInTree(pv *v1.Persisten | |||
gceSource.Partition = int32(partInt) | |||
} | |||
|
|||
// TODO: Take the zone/regional information and stick it into the label. | |||
// translate CSI topology to In-tree topology for rollback compatibility | |||
if err := translateTopologyFromCSIToInTree(pv, GCEPDTopologyKey); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to add the labels too in case people have tooling on the kubernetes labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The labels of migrated PV already exist in the current case even before it goes to label admission controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PV.Labels are set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, it is not... Sorry I had a misunderstanding. Will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add the zone labels in if it does not exist. But I dont know if we can derive region info from Zone values for all the plugins. So I did not add the region label.
An alternative solution is to add a generateRegionFromZone
func as a parameter to let each plugin pass a region parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have 5 plugins to support. I think it's not that critical to have a common util function and each plugin can parse region their own ways.
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
555f064
to
aed3975
Compare
// In-tree topology has precedence over zone labels. | ||
func translateTopology(pv *v1.PersistentVolume, topologyKey string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw do you know come this function only translated zone and not region? it looks like azuredd and cinder provisioners do set region, so the info gets lost on translation, potentially it is a bug for them.
OTOH, your new translateTopologyFromCSIToInTree does handle region makes more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I am not familiar with those two so I could not tell if this is going to be a problem. But I guess their driver has its own CSI migration test and if those tests are working fine. Then it should be good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's leave it as is, I may open a separate issue if necessary but I was just curious since now there is some asymmetry between translateTopologyFromCSIToInTree and translateTopologyFromInTreeToCSI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @andyzhangx @gnufied for region questions on azure and openstack.
At least in aws and gcp cases, you can derive the region from the zone.
@@ -286,7 +291,10 @@ func (g *gcePersistentDiskCSITranslator) TranslateCSIPVToInTree(pv *v1.Persisten | |||
gceSource.Partition = int32(partInt) | |||
} | |||
|
|||
// TODO: Take the zone/regional information and stick it into the label. | |||
// translate CSI topology to In-tree topology for rollback compatibility | |||
if err := translateTopologyFromCSIToInTree(pv, GCEPDTopologyKey); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PV.Labels are set?
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
// In-tree topology has precedence over zone labels. | ||
func translateTopology(pv *v1.PersistentVolume, topologyKey string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @andyzhangx @gnufied for region questions on azure and openstack.
At least in aws and gcp cases, you can derive the region from the zone.
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
} | ||
} | ||
} else { | ||
// nothing is in the label, we try our best to add the CSI topology zone in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should prioritize CSI first before the PV label. Someone would have to manually create their own PV or have their own CSI driver to get into the scenario where a CSI PV object would have a intree label on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry which comment are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry wrong reply, lol. I thought I was replying to another comment. NVM
So here, you mean if there is a CSI topology and also PV labels. We should use the CSI topology to generate Kubernetes topology first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should prefer using what's in CSI topology over PV labels because in the normal case we expect only CSI topology to exist. If PV label exists, something unusual is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
} | ||
} | ||
} else { | ||
// nothing is in the label, we try our best to add the CSI topology zone in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry which comment are you referring to?
b0d313b
to
b0e5532
Compare
/retest |
for i, nodeSelectorRequirement := range nsrequirements { | ||
if nodeSelectorRequirement.Key == topologyKey { | ||
index = i | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I changed the logic to remove the key from:
- Multiple terms(This will be the regional PD case I think)
- Single term with duplicate key match expression(although I doubt this will happen)
Also add removeTopology test
// Check the NodeAffinity first for the topology version | ||
zoneGA := TopologyKeyExist(v1.LabelTopologyZone, pv.Spec.NodeAffinity) | ||
regionGA := TopologyKeyExist(v1.LabelTopologyRegion, pv.Spec.NodeAffinity) | ||
if zoneGA || regionGA { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be very unexpected, but what if the zone label was beta and the region label was GA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, firstly it will be very unexpected... But it could happen, my criteria here is that if it has a GA version topology, then it should be treated as GA supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what if zone label is beta, but region label is GA. Then we will fail to fine the zone because we're looking for GA zone label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so do you suggest we return zone label version and region label version separately? Like we can return GA version region label and beta version zone label in the case you described?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I think it will simplify the logic too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, it would be good to not overcomplicate this, so I will keep the old logic which will assume Zone and Region labels are in the same version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry coming back to this again. Seeing zone || region
makes me asks questions like "what if zone was false?". Since we care about zone most importantly, what if we only check zone, and then return the corresponding region version based off of what we found? (ie, if we found beta zone, then return beta region without checking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
topologyRegionValue := getTopologyValues(pv, regionLabel) | ||
if len(topologyRegionValue) == 0 { | ||
// 2.1 Let's try to use CSI topology to recover the region topology first | ||
if len(csiTopologyZoneValues) > 0 && regionParser != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need csi zone values to be set? Or just regionParser?
@msau42 I thought about this more offline. And I thought you are right, it would be easier if we use replaceTopology instead of remove/add. I changed all the corresponding part and write completely new function in the new commit. To summarize the behavior in
Please let me know if this model is easier to understand and follow. |
continue | ||
} | ||
// If no regionLabel found, generate region label from the zoneLabel we collect from this term | ||
regionVal, err := getRegionFromZones(zoneVals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think a lot of other cloud-provider like aws/azure an reuse this function if we allow getRegionFromZones function to be a parameter. The other logic should be the same for other cloud-providers. But I guess it is fine we keep it this way. We can let them to decide if they want to use this or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea.
continue | ||
} | ||
// If no regionLabel found, generate region label from the zoneLabel we collect from this term | ||
regionVal, err := getRegionFromZones(zoneVals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea.
if zoneBeta := TopologyKeyExist(v1.LabelFailureDomainBetaZone, pv.Spec.NodeAffinity); zoneBeta { | ||
return v1.LabelFailureDomainBetaZone, v1.LabelFailureDomainBetaRegion | ||
} | ||
if regionGA := TopologyKeyExist(v1.LabelTopologyRegion, pv.Spec.NodeAffinity); regionGA { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we don't need to check for region at all. Because if zone is missing, then we're in trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I can remove the region related checks and only check zone
staging/src/k8s.io/csi-translation-lib/plugins/in_tree_volume.go
Outdated
Show resolved
Hide resolved
/test pull-kubernetes-e2e-azure-disk-vmss |
c78bffb
to
12e8d47
Compare
/lgtm Thanks! |
/assign @andrewsykim |
This also includes a change on CSI migration TranslateCSIToInTree where we remove the CSI topology and add Kubernetes Topology to the NodeAffinity
- Instead of using a remove/add way to deal with the topology label, go directly with the replace topology way so that we dont have to care all the corner case. Also, since the translateTopologyFromCSIToInTree only get called from csi-provisioner, there should not be very odd PV object spec. - Change the beheavior translateTopologyFromInTreeToCSI so that it will replace zone label to CSI Topology label and also replace any beta region label to GA label
12e8d47
to
f83df5d
Compare
I squashed commits to make it cleaner. No new changes are introduced. |
/test pull-kubernetes-e2e-azure-disk-vmss |
/lgtm |
@andrewsykim Gentle ping. Hey Andrew, can you help to take a look? This is blocking the following steps of updating the topology label to GA. Thanks a lot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
for k8s.io/cloud-provider, thanks!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrewsykim, Jiawei0227, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This also includes a change on CSI migration TranslateCSIToInTree
where we remove the CSI topology and add Kubernetes Topology to
the NodeAffinity
What type of PR is this?
/kind bug
/kind cleanup
What this PR does / why we need it:
This PR is a subset work from #97746 with some comments addressed and test.
Specifically, this PR prepares for the migration from
failure-domain.beta.kubernetes.io/zone
topology to GA version which istopology.kubernetes.io/zone
.Besides, this PR also contains a behavior change in the csi-translation library TranslateCSIToInTree for gce-pd where we remove the CSI topology and inserts the kubernetes topology. This surfaces two purpose: 1. Fix a potential rollback issue. 2. Prepare for the fully deprecation of PV label admission-controller.
Which issue(s) this PR fixes:
Fixes #
Partially fix #92237
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/sig storage
/assign @msau42
/cc @wongma7 @mattcary @verult
The follow-up for this PR includes: