New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up PV provisioning for vsphere driver #100054
Speed up PV provisioning for vsphere driver #100054
Conversation
/assign @divyenpatel |
/triage accepted |
cc @yuga711 |
/retest |
1 similar comment
/retest |
@gnufied Can you share the testing result on the PR? Is this change validated by creating few volumes? |
} | ||
for _, host := range hostInfos { | ||
hostDCName := fmt.Sprintf("%s/%s", host.datacenter, host.hostMOID) | ||
nodeHosts[hostDCName] = host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not getting why we are using hostDCName as the key for nodeHosts?
I see hostDCName is created using fmt.Sprintf("%s/%s", host.datacenter, host.hostMOID)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doing this to eliminate duplicate hosts that will likely happen because many nodes will likely run on the same host. Also I am using a combination of host.datacenter
and host.hostMOID
because - if I remember correctly the hostMOID (which will look like HostSystsem:host-213
) is only guaranteed to be unique in a single vcenter, so if somehow more than one host has same hostMOID
it will consider them as distinct hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add this knowledge as a comment, e.g. in the hostInfo
struct?
dcNodes[vcDC] = append(dcNodes[vcDC], nodeInfo) | ||
} | ||
|
||
for dcVC, nodes := range dcNodes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dcNodes
is populated with keys vcDC
but here the keys are dcVC
. Can it be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
result := true | ||
for _, host := range nodeHosts { | ||
hostFound := false | ||
for _, targetHost := range dataStoreHosts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be very costly here to iterate through all dataStoreHosts for all nodeHosts for all candidateDatastores? Can host. hostUUID
and host. hostMOID
be stored in nodeHosts
instead for a simple look up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "all candidateDatastores", we just need first one we find.
@divyenpatel yes I have tested both successful cases (i.e where datastore was shared with hosts) and failure cases (where datastore was not shared with all hosts) and they both work after this change. For example: Failed case
Success Case
|
@divyenpatel @RaunakShah are there any outstanding comments here? Can you please check? I am trying to get this in as a bugfix in 1.21 |
/approve |
/lgtm |
/assign @andrewsykim |
Speeds up PV provisioning for vSphere driver by using bulk fetching of hosts and reversing logic that fetches datastores Add warning about fetching hosts individually
3c7637e
to
c3cc5a4
Compare
This is a bugfix for #99372 |
@jsafrane: The provided milestone is not valid for this repository. Milestones in this repository: [ Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/milestone v1.21 |
/retest |
/approve /assign @andrewsykim |
/assign @cheftako @cheftalp I think this one is yours since it's vsphere-related |
var vmoList []mo.VirtualMachine | ||
err := pc.Retrieve(ctx, vmRefs, []string{nameProperty, runtimeHost}, &vmoList) | ||
if err != nil { | ||
klog.Errorf("SharedHost.getNodeHosts: unable to fetch vms from datacenter %s: %w", nodeInfo.dataCenter.String(), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this supposed to be %v for the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NM unwrap, cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this breaks go test
:
go test k8s.io/legacy-cloud-providers/vsphere -mod=readonly
# k8s.io/legacy-cloud-providers/vsphere
staging/src/k8s.io/legacy-cloud-providers/vsphere/shared_datastore.go:156:3: Errorf call has error-wrapping directive %w
staging/src/k8s.io/legacy-cloud-providers/vsphere/shared_datastore.go:172:3: Errorf call has error-wrapping directive %w
FAIL k8s.io/legacy-cloud-providers/vsphere [build failed]
FAIL
pc = property.DefaultCollector(nodeInfo.dataCenter.Client()) | ||
err = pc.Retrieve(ctx, hostRefs, []string{summary}, &hostMoList) | ||
if err != nil { | ||
klog.Errorf("SharedHost.getNodeHosts: unable to fetch hosts from datacenter %s: %w", nodeInfo.dataCenter.String(), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also breaks go test
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheftako, divyenpatel, gnufied, jsafrane, RaunakShah The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
vSphere PV provisioning progressively gets slower as more nodes and hosts are added to the cluster. I think this is mainly because number of repeated API calls to vCenter we make during volume provisioning.
Typically we make following calls for immediate binding non-zone based provisioning:
So, this results in quite a bit of API calls to vCenter API.
This PR simplifies that logic:
This PR reduces number of API calls to vCenter from potential hundreds to 5 or 6.
Caveats:
Fixes #99372
/sig storage
/kind bug