Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[volumeScheduling/metrics] Fix buckets initialization #100720

Merged

Conversation

dntosas
Copy link
Contributor

@dntosas dntosas commented Mar 31, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

This metric is measured in seconds so it makes no sense starting from
1000 as init value. This breaks also the scheduler e2e metric thus make
users unable to compute, for example, their SLO for the scheduler.
Even if this metric is deprecated, it should behave correctly until it is
completely removed to avoid user confusion.

For example, for each volume created, the minimum value exposed
as a metric is 16.6min (1000sec/60) which is obviously wrong as logic.

In this commit, we migrate bucket creation to start from reasonable
numbers, copying the incrementation from the conventions that the
scheduler follows itself.

Special notes for your reviewer:

Check this example of trying to define some SLO for scheduling, but
exposed data is kinda unusable:
Screenshot from 2021-03-31 20-37-59

Does this PR introduce a user-facing change?

- Metrics changes: Fix exposed buckets of `scheduler_volume_scheduling_duration_seconds_bucket` metric

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 31, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @dntosas!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @dntosas. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 31, 2021
@dntosas
Copy link
Contributor Author

dntosas commented Apr 2, 2021

hola @lichuqiang @msau42 ! any change you take a look in here? pretty small and quick :)

@msau42
Copy link
Member

msau42 commented Apr 2, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 2, 2021
@msau42
Copy link
Member

msau42 commented Apr 2, 2021

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 2, 2021
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@ehashman
Copy link
Member

ehashman commented Apr 7, 2021

/hold

This metric is deprecated as of 1.19. Presumably it should be getting removed entirely in the upcoming release?

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Apr 7, 2021
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Aug 17, 2021
@dntosas
Copy link
Contributor Author

dntosas commented Aug 17, 2021

@dims @msau42 also rebased and this is ready

@alculquicondor
Copy link
Member

Given that this metric is deprecated, what are your plans for this PR? Do you want to backport it?

Ideally we would just remove it in the 1.23 release.

@dntosas
Copy link
Contributor Author

dntosas commented Aug 19, 2021

Given that this metric is deprecated, what are your plans for this PR? Do you want to backport it?

Ideally we would just remove it in the 1.23 release.

yeap, I was thinking of backporting to existing releases and then create a new PR that removes the metric from the master branch. WDYT ?

@alculquicondor
Copy link
Member

@logicalhan is this plan above acceptable? Arguably, the existing metric is broken, but changing the buckets is probably a breaking change. Right?

@logicalhan
Copy link
Member

@logicalhan is this plan above acceptable? Arguably, the existing metric is broken, but changing the buckets is probably a breaking change. Right?

Changing buckets is actually fine, it's adding removing labels that breaks people.

Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dntosas, logicalhan, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alculquicondor
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 19, 2021
@dntosas
Copy link
Contributor Author

dntosas commented Aug 19, 2021

/retest

@alculquicondor
Copy link
Member

/retest

The failure seems unrelated. Opened #104463

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit df51e4e into kubernetes:master Aug 20, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 20, 2021
dntosas added a commit to dntosas/kubernetes that referenced this pull request Aug 23, 2021
As part of kubernetes#100720 we
backported fix on existing releases and in this commit we completely
remove the deprecated metric from master branch.

Signed-off-by: dntosas <ntosas@gmail.com>
k8s-ci-robot added a commit that referenced this pull request Sep 10, 2021
…720-upstream-release-1.20

Automated cherry pick of #100720: Fix buckets initialization
k8s-ci-robot added a commit that referenced this pull request Sep 10, 2021
…720-upstream-release-1.21

Automated cherry pick of #100720: Fix buckets initialization
k8s-ci-robot added a commit that referenced this pull request Sep 10, 2021
…720-upstream-release-1.19

Automated cherry pick of #100720: Fix buckets initialization
ibabou pushed a commit to ibabou/kubernetes that referenced this pull request Sep 28, 2021
As part of kubernetes#100720 we
backported fix on existing releases and in this commit we completely
remove the deprecated metric from master branch.

Signed-off-by: dntosas <ntosas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants