Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparation for Topology migration to GA for CSI migration #97823

Merged
merged 2 commits into from Feb 2, 2021

Conversation

Jiawei0227
Copy link
Contributor

@Jiawei0227 Jiawei0227 commented Jan 8, 2021

This also includes a change on CSI migration TranslateCSIToInTree
where we remove the CSI topology and add Kubernetes Topology to
the NodeAffinity

What type of PR is this?
/kind bug
/kind cleanup

What this PR does / why we need it:
This PR is a subset work from #97746 with some comments addressed and test.
Specifically, this PR prepares for the migration from failure-domain.beta.kubernetes.io/zone topology to GA version which is topology.kubernetes.io/zone.

Besides, this PR also contains a behavior change in the csi-translation library TranslateCSIToInTree for gce-pd where we remove the CSI topology and inserts the kubernetes topology. This surfaces two purpose: 1. Fix a potential rollback issue. 2. Prepare for the fully deprecation of PV label admission-controller.

Which issue(s) this PR fixes:

Fixes #

Partially fix #92237

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Remove CSI topology from migrated in-tree gcepd volume.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/sig storage
/assign @msau42
/cc @wongma7 @mattcary @verult

The follow-up for this PR includes:

  1. Upgrade csi-translation-lib to the newer version on csi-provisioner
  2. Upgrade in-tree volume plugins to use the GA label

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 8, 2021
@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/storage Categorizes an issue or PR as relevant to SIG Storage. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 8, 2021
@k8s-ci-robot
Copy link
Contributor

@Jiawei0227: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/cloudprovider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. labels Jan 8, 2021
Copy link
Contributor Author

@Jiawei0227 Jiawei0227 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @msau42 , thanks for your review! Sorry I have to close the old one and open this new one.

I think to make it cleaner, I am focusing this PR on the csi-translation-lib fix first and I also include some preparation work for topology label GA promotion. I will work on the actual label migration for gcepd as soon as this PR merges.

I have copied your comment and replied below. Please take a look. Thanks!

@@ -286,7 +291,10 @@ func (g *gcePersistentDiskCSITranslator) TranslateCSIPVToInTree(pv *v1.Persisten
gceSource.Partition = int32(partInt)
}

// TODO: Take the zone/regional information and stick it into the label.
// translate CSI topology to In-tree topology for rollback compatibility
if err := translateTopologyFromCSIToInTree(pv, GCEPDTopologyKey); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not just removing the old key. We need to add in the kubernetes topology keys.
The PV label admission controller is going to be removed in the future, so we cannot depend on the admission controller adding it for us.

Done.

@@ -135,7 +138,7 @@ func SelectZonesForVolume(zoneParameterPresent, zonesParameterPresent bool, zone
}

if (len(allowedTopologies) > 0) && (allowedZones.Len() == 0) {
return nil, fmt.Errorf("no matchLabelExpressions with %s key found in allowedTopologies. Please specify matchLabelExpressions with %s key", v1.LabelFailureDomainBetaZone, v1.LabelFailureDomainBetaZone)
return nil, fmt.Errorf("no matchLabelExpressions with %s key found in allowedTopologies. Please specify matchLabelExpressions with %s key", v1.LabelTopologyZone, v1.LabelTopologyZone)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic shared by other clouds too? Will this cause other clouds' Provision() to use GA labels?

This function SelectZoneForVolume is shared by all in-tree plugins and is being called in in-tree code. This function only picks a zone value from the labels. So it does not matter if the value comes from GA or beta label. For example:

If a node has failure-domain.beta.kubernetes.io/zone=us-west-2 and topology.kubernetes.io/zone=us-west-2. This function will just return the us-west-2 so it should not matter here.

@Jiawei0227 Jiawei0227 changed the title Preparation for Topology migration to GA from CSI migration Preparation for Topology migration to GA for CSI migration Jan 8, 2021
@@ -286,7 +291,10 @@ func (g *gcePersistentDiskCSITranslator) TranslateCSIPVToInTree(pv *v1.Persisten
gceSource.Partition = int32(partInt)
}

// TODO: Take the zone/regional information and stick it into the label.
// translate CSI topology to In-tree topology for rollback compatibility
if err := translateTopologyFromCSIToInTree(pv, GCEPDTopologyKey); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to add the labels too in case people have tooling on the kubernetes labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The labels of migrated PV already exist in the current case even before it goes to label admission controller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PV.Labels are set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, it is not... Sorry I had a misunderstanding. Will add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add the zone labels in if it does not exist. But I dont know if we can derive region info from Zone values for all the plugins. So I did not add the region label.

An alternative solution is to add a generateRegionFromZone func as a parameter to let each plugin pass a region parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have 5 plugins to support. I think it's not that critical to have a common util function and each plugin can parse region their own ways.

@Jiawei0227 Jiawei0227 force-pushed the translation-lib branch 2 times, most recently from 555f064 to aed3975 Compare January 8, 2021 19:03
// In-tree topology has precedence over zone labels.
func translateTopology(pv *v1.PersistentVolume, topologyKey string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw do you know come this function only translated zone and not region? it looks like azuredd and cinder provisioners do set region, so the info gets lost on translation, potentially it is a bug for them.

OTOH, your new translateTopologyFromCSIToInTree does handle region makes more sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I am not familiar with those two so I could not tell if this is going to be a problem. But I guess their driver has its own CSI migration test and if those tests are working fine. Then it should be good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's leave it as is, I may open a separate issue if necessary but I was just curious since now there is some asymmetry between translateTopologyFromCSIToInTree and translateTopologyFromInTreeToCSI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @andyzhangx @gnufied for region questions on azure and openstack.

At least in aws and gcp cases, you can derive the region from the zone.

@@ -286,7 +291,10 @@ func (g *gcePersistentDiskCSITranslator) TranslateCSIPVToInTree(pv *v1.Persisten
gceSource.Partition = int32(partInt)
}

// TODO: Take the zone/regional information and stick it into the label.
// translate CSI topology to In-tree topology for rollback compatibility
if err := translateTopologyFromCSIToInTree(pv, GCEPDTopologyKey); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PV.Labels are set?

// In-tree topology has precedence over zone labels.
func translateTopology(pv *v1.PersistentVolume, topologyKey string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @andyzhangx @gnufied for region questions on azure and openstack.

At least in aws and gcp cases, you can derive the region from the zone.

}
}
} else {
// nothing is in the label, we try our best to add the CSI topology zone in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should prioritize CSI first before the PV label. Someone would have to manually create their own PV or have their own CSI driver to get into the scenario where a CSI PV object would have a intree label on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry which comment are you referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry wrong reply, lol. I thought I was replying to another comment. NVM

So here, you mean if there is a CSI topology and also PV labels. We should use the CSI topology to generate Kubernetes topology first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should prefer using what's in CSI topology over PV labels because in the normal case we expect only CSI topology to exist. If PV label exists, something unusual is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}
} else {
// nothing is in the label, we try our best to add the CSI topology zone in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry which comment are you referring to?

@Jiawei0227
Copy link
Contributor Author

/retest

for i, nodeSelectorRequirement := range nsrequirements {
if nodeSelectorRequirement.Key == topologyKey {
index = i
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the key is only in one element. Is always going to be the case?

Especially consider RegionalPD because the topology can be expressed 2 different ways. cc @mattcary @verult

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I changed the logic to remove the key from:

  1. Multiple terms(This will be the regional PD case I think)
  2. Single term with duplicate key match expression(although I doubt this will happen)

Also add removeTopology test

// Check the NodeAffinity first for the topology version
zoneGA := TopologyKeyExist(v1.LabelTopologyZone, pv.Spec.NodeAffinity)
regionGA := TopologyKeyExist(v1.LabelTopologyRegion, pv.Spec.NodeAffinity)
if zoneGA || regionGA {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be very unexpected, but what if the zone label was beta and the region label was GA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, firstly it will be very unexpected... But it could happen, my criteria here is that if it has a GA version topology, then it should be treated as GA supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what if zone label is beta, but region label is GA. Then we will fail to fine the zone because we're looking for GA zone label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so do you suggest we return zone label version and region label version separately? Like we can return GA version region label and beta version zone label in the case you described?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I think it will simplify the logic too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, it would be good to not overcomplicate this, so I will keep the old logic which will assume Zone and Region labels are in the same version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry coming back to this again. Seeing zone || region makes me asks questions like "what if zone was false?". Since we care about zone most importantly, what if we only check zone, and then return the corresponding region version based off of what we found? (ie, if we found beta zone, then return beta region without checking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

topologyRegionValue := getTopologyValues(pv, regionLabel)
if len(topologyRegionValue) == 0 {
// 2.1 Let's try to use CSI topology to recover the region topology first
if len(csiTopologyZoneValues) > 0 && regionParser != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need csi zone values to be set? Or just regionParser?

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 13, 2021
@Jiawei0227
Copy link
Contributor Author

@msau42 I thought about this more offline. And I thought you are right, it would be easier if we use replaceTopology instead of remove/add. I changed all the corresponding part and write completely new function in the new commit. To summarize the behavior in translateTopologyFromCSIToInTree:

  1. We will replace all the CSITopology to Kubernetes Zone Topology(And combine existing Kubernetes Zone Topology if any exists already)
  2. We will add region topology if a regionTopologyHandler function is passed by the plugin. So each plugin can handle their region logic separately. Specifically for gce-pd, I prepared a gceRegionTopologyHandler which basically goes over each NodeAffinity term, and generate region val for each term. So this also handles the case where we have multiple region in different terms.
  3. Add zone and region labels to the PV if it does not exist.

Please let me know if this model is easier to understand and follow.

continue
}
// If no regionLabel found, generate region label from the zoneLabel we collect from this term
regionVal, err := getRegionFromZones(zoneVals)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think a lot of other cloud-provider like aws/azure an reuse this function if we allow getRegionFromZones function to be a parameter. The other logic should be the same for other cloud-providers. But I guess it is fine we keep it this way. We can let them to decide if they want to use this or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea.

continue
}
// If no regionLabel found, generate region label from the zoneLabel we collect from this term
regionVal, err := getRegionFromZones(zoneVals)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea.

if zoneBeta := TopologyKeyExist(v1.LabelFailureDomainBetaZone, pv.Spec.NodeAffinity); zoneBeta {
return v1.LabelFailureDomainBetaZone, v1.LabelFailureDomainBetaRegion
}
if regionGA := TopologyKeyExist(v1.LabelTopologyRegion, pv.Spec.NodeAffinity); regionGA {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we don't need to check for region at all. Because if zone is missing, then we're in trouble.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I can remove the region related checks and only check zone

@andyzhangx
Copy link
Member

/test pull-kubernetes-e2e-azure-disk-vmss
/test pull-kubernetes-e2e-azure-disk

@msau42
Copy link
Member

msau42 commented Jan 20, 2021

/lgtm
/approve

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 20, 2021
@msau42
Copy link
Member

msau42 commented Jan 20, 2021

/assign @andrewsykim

This also includes a change on CSI migration TranslateCSIToInTree
where we remove the CSI topology and add Kubernetes Topology to
the NodeAffinity
- Instead of using a remove/add way to deal with the
topology label, go directly with the replace topology
way so that we dont have to care all the corner case.
Also, since the translateTopologyFromCSIToInTree only
get called from csi-provisioner, there should not be
very odd PV object spec.

- Change the beheavior translateTopologyFromInTreeToCSI
so that it will replace zone label to CSI Topology label
and also replace any beta region label to GA label
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 20, 2021
@Jiawei0227
Copy link
Contributor Author

Jiawei0227 commented Jan 20, 2021

I squashed commits to make it cleaner. No new changes are introduced.

@andyzhangx
Copy link
Member

/test pull-kubernetes-e2e-azure-disk-vmss
/test pull-kubernetes-e2e-azure-disk

@msau42
Copy link
Member

msau42 commented Jan 23, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2021
@Jiawei0227
Copy link
Contributor Author

@andrewsykim Gentle ping. Hey Andrew, can you help to take a look? This is blocking the following steps of updating the topology label to GA. Thanks a lot

Copy link
Member

@andrewsykim andrewsykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

for k8s.io/cloud-provider, thanks!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, Jiawei0227, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PD in-tree plugin: migrate to stable zone/region labels
7 participants