Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up PV provisioning for vsphere driver #100054

Merged

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Mar 10, 2021

vSphere PV provisioning progressively gets slower as more nodes and hosts are added to the cluster. I think this is mainly because number of repeated API calls to vCenter we make during volume provisioning.

Typically we make following calls for immediate binding non-zone based provisioning:

  1. We get list of nodes(lets call it N) from node manager (which in itself could trigger API calls)
  2. For each node, we fetch its host information from node itself. This requires additional API calls for each node in the cluster.(N number of API calls)
  3. Then from the host ref obtained from previous API call, we fetch full host information including its datastore. This is done for each node separately.
  4. For all of datastore refs obtained from previous API call, we fetch full datastore information.
  5. For each datastore obtained from previous API calls, we make separate API call to construct full Datastore object. This is especially problematic because any number of datastores could be attached to a host and hence fetching them individually could be expensive.

So, this results in quite a bit of API calls to vCenter API.

This PR simplifies that logic:

  1. We get list of nodes from node manager (which in itself could trigger API calls)
  2. For every known node, we fetch host information in a one-shot API call (and hence no per-node call)
  3. For all of host ref obtained from previous list, we make a single API call to fetch full host information.
  4. Now rather than checking which datastores are shared with these hosts, we just check hosts that are attached to selected "candidate" datastores (this saves a lot of API calls)
  5. Now in last step - we just check if hosts to which "candidate" store is attached is a super-set of host set we built in step#3.

This PR reduces number of API calls to vCenter from potential hundreds to 5 or 6.

Caveats:

  1. I have not fixed code path that uses spbm policies. Those still result in same number of API calls (but we could fix progressively).
  2. We rely on host's UUID to check if node hosts is a subset of "attached hosts" of datastore. I would request reviewers to review that part of code carefully.

Fixes #99372

/sig storage
/kind bug

Improve speed of vSphere PV provisioning and reduce number of API calls

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 10, 2021
@k8s-ci-robot k8s-ci-robot added area/cloudprovider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. labels Mar 10, 2021
@gnufied
Copy link
Member Author

gnufied commented Mar 10, 2021

/assign @divyenpatel

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 10, 2021
@gnufied
Copy link
Member Author

gnufied commented Mar 10, 2021

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 10, 2021
@gnufied
Copy link
Member Author

gnufied commented Mar 10, 2021

cc @yuga711

@yangjunmyfm192085
Copy link
Contributor

/retest

1 similar comment
@gnufied
Copy link
Member Author

gnufied commented Mar 10, 2021

/retest

@divyenpatel
Copy link
Member

@gnufied Can you share the testing result on the PR? Is this change validated by creating few volumes?

}
for _, host := range hostInfos {
hostDCName := fmt.Sprintf("%s/%s", host.datacenter, host.hostMOID)
nodeHosts[hostDCName] = host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not getting why we are using hostDCName as the key for nodeHosts?
I see hostDCName is created using fmt.Sprintf("%s/%s", host.datacenter, host.hostMOID).

Copy link
Member Author

@gnufied gnufied Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing this to eliminate duplicate hosts that will likely happen because many nodes will likely run on the same host. Also I am using a combination of host.datacenter and host.hostMOID because - if I remember correctly the hostMOID (which will look like HostSystsem:host-213) is only guaranteed to be unique in a single vcenter, so if somehow more than one host has same hostMOID it will consider them as distinct hosts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add this knowledge as a comment, e.g. in the hostInfo struct?

dcNodes[vcDC] = append(dcNodes[vcDC], nodeInfo)
}

for dcVC, nodes := range dcNodes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dcNodes is populated with keys vcDC but here the keys are dcVC. Can it be consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

result := true
for _, host := range nodeHosts {
hostFound := false
for _, targetHost := range dataStoreHosts {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be very costly here to iterate through all dataStoreHosts for all nodeHosts for all candidateDatastores? Can host. hostUUID and host. hostMOID be stored in nodeHosts instead for a simple look up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "all candidateDatastores", we just need first one we find.

@gnufied
Copy link
Member Author

gnufied commented Mar 11, 2021

@divyenpatel yes I have tested both successful cases (i.e where datastore was shared with hosts) and failure cases (where datastore was not shared with all hosts) and they both work after this change.

For example:

Failed case

E0311 04:10:45.175143 3096210 vsphere.go:1397] Failed to get shared datastore: unable to find any shared datastores
I0311 04:10:45.175195 3096210 pv_controller.go:1580] failed to provision volume for claim "default/myclaim" with StorageClass "thin": unable to find any shared datastores
E0311 04:10:45.175240 3096210 goroutinemap.go:150] Operation for "provision-default/myclaim[e371dae5-89d9-45ae-9d39-f8fa91d10dc6]" failed. No retries permitted until 2021-03-11 04:10:46.175223777 +0000 UTC m=+65.627356168 (durationBeforeRetry 1s). Error: "unable to find any shared datastores"
I0311 04:10:45.175305 3096210 event.go:291] "Event occurred" object="default/myclaim" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="Failed to provision volume with StorageClass \"thin\": unable to find any shared datastores"
^C

Success Case

I0311 04:15:56.836150 3105091 vsphere_volume.go:375] Provision with selectedNode:  and allowedTopologies : []
I0311 04:15:56.836191 3105091 vsphere.go:1249] Starting to create a vSphere volume with volumeOptions: &{CapacityKB:1048576 Tags:map[kubernetes.io/created-for/pv/name:pvc-233cc8d4-3a30-4d81-9aea-902ece03869e kubernetes.io/created-for/pvc/name:myclaim kubernetes.io/created-for/pvc/namespace:default] Name:kubernetes-dynamic-pvc-233cc8d4-3a30-4d81-9aea-902ece03869e DiskFormat:thin Datastore: VSANStorageProfileData: StoragePolicyName: StoragePolicyID: SCSIControllerType: Zone:[] SelectedNode:nil}
I0311 04:15:56.842483 3105091 deployment_controller.go:490] "Error syncing deployment" deployment="default/sandbox" err="Operation cannot be fulfilled on deployments.apps \"sandbox\": the object has been modified; please apply your changes to the latest version and try again"
I0311 04:15:56.855739 3105091 vsphere.go:1301] Volume topology : []
W0311 04:15:57.037626 3105091 datacenter.go:269] QueryVirtualDiskUuid failed for diskPath: "[nfsdata] kubevols/kube-dummyDisk.vmdk". err: ServerFaultCode: File [nfsdata] kubevols/kube-dummyDisk.vmdk was not found
I0311 04:15:57.037680 3105091 vsphere_volume_util.go:171] Successfully created vsphere volume kubernetes-dynamic-pvc-233cc8d4-3a30-4d81-9aea-902ece03869e
I0311 04:15:57.037708 3105091 pv_controller.go:1585] volume "pvc-233cc8d4-3a30-4d81-9aea-902ece03869e" for claim "default/myclaim" created
I0311 04:15:57.043546 3105091 pv_controller.go:1610] volume "pvc-233cc8d4-3a30-4d81-9aea-902ece03869e" for claim "default/myclaim" saved
I0311 04:15:57.043607 3105091 pv_controller.go:1663] volume "pvc-233cc8d4-3a30-4d81-9aea-902ece03869e" provisioned for claim "default/myclaim"

@gnufied
Copy link
Member Author

gnufied commented Mar 12, 2021

@divyenpatel @RaunakShah are there any outstanding comments here? Can you please check? I am trying to get this in as a bugfix in 1.21

@RaunakShah
Copy link
Contributor

/approve

@divyenpatel
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 12, 2021
@gnufied
Copy link
Member Author

gnufied commented Mar 15, 2021

/assign @andrewsykim

Speeds up PV provisioning for vSphere driver by
using bulk fetching of hosts and reversing logic that fetches datastores
Add warning about fetching hosts individually
@gnufied gnufied force-pushed the make-pv-provisioning-faster-vsphere branch from 3c7637e to c3cc5a4 Compare March 16, 2021 16:21
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2021
@jsafrane
Copy link
Member

This is a bugfix for #99372
/milestone 1.21
/lgtm

@k8s-ci-robot
Copy link
Contributor

@jsafrane: The provided milestone is not valid for this repository. Milestones in this repository: [next-candidate, v1.16, v1.17, v1.18, v1.19, v1.20, v1.21, v1.22]

Use /milestone clear to clear the milestone.

In response to this:

This is a bugfix for #99372
/milestone 1.21
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2021
@jsafrane
Copy link
Member

/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 16, 2021
@jsafrane
Copy link
Member

/retest

@jsafrane
Copy link
Member

/approve
from sig-storage perspective

/assign @andrewsykim
for cloud provider approval

@andrewsykim
Copy link
Member

/assign @cheftako

@cheftalp I think this one is yours since it's vsphere-related

var vmoList []mo.VirtualMachine
err := pc.Retrieve(ctx, vmRefs, []string{nameProperty, runtimeHost}, &vmoList)
if err != nil {
klog.Errorf("SharedHost.getNodeHosts: unable to fetch vms from datacenter %s: %w", nodeInfo.dataCenter.String(), err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this supposed to be %v for the error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NM unwrap, cool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breaks go test:

go test k8s.io/legacy-cloud-providers/vsphere -mod=readonly
# k8s.io/legacy-cloud-providers/vsphere
staging/src/k8s.io/legacy-cloud-providers/vsphere/shared_datastore.go:156:3: Errorf call has error-wrapping directive %w
staging/src/k8s.io/legacy-cloud-providers/vsphere/shared_datastore.go:172:3: Errorf call has error-wrapping directive %w
FAIL	k8s.io/legacy-cloud-providers/vsphere [build failed]
FAIL

pc = property.DefaultCollector(nodeInfo.dataCenter.Client())
err = pc.Retrieve(ctx, hostRefs, []string{summary}, &hostMoList)
if err != nil {
klog.Errorf("SharedHost.getNodeHosts: unable to fetch hosts from datacenter %s: %w", nodeInfo.dataCenter.String(), err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also breaks go test

@cheftako
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheftako, divyenpatel, gnufied, jsafrane, RaunakShah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 22, 2021
@gnufied
Copy link
Member Author

gnufied commented Mar 22, 2021

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vsphere disk provisioning can be slow in large clusters
10 participants