New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apiserver add lease object count metric #97480
apiserver add lease object count metric #97480
Conversation
Hi @lingsamuel. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
b73e3e9
to
61f3ca8
Compare
"The time in seconds that each lease is reused. A lower value could avoid large number of objects reusing the same lease. Notice that a too small value may cause performance problems at storage layer.") | ||
|
||
fs.Int64Var(&s.StorageConfig.LeaseManagerConfig.MaxObjectSize, "lease-max-object-size", s.StorageConfig.LeaseManagerConfig.MaxObjectSize, | ||
"The max object size that each lease can attach. This option is under tuning, the default value is not a recommended value, just a placeholder currently. A lower value could avoid large number of objects reusing the same lease. Notice that a too small value may cause performance problems at storage layer.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we make a flag, we shouldn't change the default if possible. I think we will can come up with a reasonable default at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never tested etcd lease performance before. What's your recommended value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm think maybe 5k? @mborsz - WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea :D
I suggest splitting this PR into two:
- no behavioral change, only metrics + log below to understand the current values (it should be easily accessible in our 5k tests)
- actual implementing lease limiting with value based on the above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Once this get merged and we found a reasonable default value, update #96836 and I will submit a PR with --lease-max-object-count
staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go
Outdated
Show resolved
Hide resolved
ddaf690
to
9e8e0bd
Compare
567a53e
to
05d0f85
Compare
staging/src/k8s.io/apiserver/pkg/storage/etcd3/lease_manager.go
Outdated
Show resolved
Hide resolved
Buckets: []float64{10, 50, 100, 500, 1000, 2500, 5000}, | ||
StabilityLevel: compbasemetrics.ALPHA, | ||
}, | ||
[]string{"ttl"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering about this one - if at some point we decide to use TTL in a smarter way, we may end-up with huge histogram.
Do we really need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume we have many small TTL lease managers and one large TTL lease manager. Small TTL leases make buckets like 10 or 50 contains large data, then we can't know how the large TTL lease manager contributes to the 10/50 buckets.
But I think if we have many lease managers with different TTLs, they may need to have a different name. The name would be a better label at that time. Currently we don't have better field to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could leave this problem here until we need to use multiple TTL lease managers.
/ok-to-test |
Signed-off-by: Ling Samuel <lingsamuelgrace@gmail.com>
05d0f85
to
7e9fe39
Compare
/lgtm Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lingsamuel, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I took a look at one of recent 5k runs (https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1351213015185231872 to be exact)
So overall, first two leases are quite large (50k-70k objects), other are around 10k-15k. The clusterloader (the creating workload on the machine) has started at '17:15:53.168001', so this means that two first large leases have been created by cluster initialization logic and not tested workload. In my opinion we should try playing with '--lease-reuse-duration-seconds' first to try to spread leases among one minute (e.g. assuming even distribution of events, setting 10s should split first large lease 6x, so to ~10k which should be reasonable). |
Signed-off-by: Ling Samuel lingsamuelgrace@gmail.com
What type of PR is this?
/kind feature
What this PR does / why we need it:
Details: #96836
Which issue(s) this PR fixes:
Part of #96836
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: