Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AcceleratorStats to cri_stats_provider #96873

Merged
merged 1 commit into from Dec 9, 2020
Merged

Add AcceleratorStats to cri_stats_provider #96873

merged 1 commit into from Dec 9, 2020

Conversation

ruiwen-zhao
Copy link
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:
Currently cadvisor_stats_provider provides AcceleratorStats but cri_stats_provider does not. As a result, when using cri_stats_provider, kubelet's Summary API does not have accelerator metrics. This PR tries to fix the discrepancy between the two providers.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
Note that this PR does not add any new metric collecting logic, but rather merely passes the AcceleratorStats collected by cadvisor to cri_stats_provider. I am planning to cherry-pick this change to release branches after merging it to master.

Does this PR introduce a user-facing change?:

AcceleratorStats will be available in the Summary API of kubelet when cri_stats_provider is used.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 25, 2020
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 25, 2020
@ruiwen-zhao
Copy link
Contributor Author

@k8s-ci-robot
Copy link
Contributor

@ruiwen-zhao: GitHub didn't allow me to request PR reviews from the following users: pradvenkat.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @SergeyKanzhelev @bobbypage @dashpole @dchen1107 @Random-Liu @pradvenkat

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ruiwen-zhao
Copy link
Contributor Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 25, 2020
@@ -780,6 +780,25 @@ func removeTerminatedContainers(containers []*runtimeapi.Container) []*runtimeap
func (p *criStatsProvider) addCadvisorContainerStats(
cs *statsapi.ContainerStats,
caPodStats *cadvisorapiv2.ContainerInfo,
) {
if caPodStats.Spec.HasCustomMetrics {
cs.UserDefinedMetrics = cadvisorInfoToUserDefinedMetrics(caPodStats)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why are we also adding un custom cAdvisor metrics here? Just for completeness?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean this question to addCadvisorContainerCPUAndMemoryStats or here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we also adding un custom cAdvisor metrics here?

Yeah I am basically keeping the same as the existing addCadvisorContainerStats.

@dashpole
Copy link
Contributor

pkg/kubelet/stats/cri_stats_provider_test.go:702:9: should omit 2nd value from range; this loop is equivalent to for i := range ...

cs *statsapi.ContainerStats,
caPodStats *cadvisorapiv2.ContainerInfo,
) {
if caPodStats.Spec.HasCustomMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely omit custom metrics if we only want CPU and Memory metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like this is just maintaining the existing behavior

@dashpole
Copy link
Contributor

/lgtm
/approve
/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 30, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, ruiwen-zhao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2020
@ruiwen-zhao
Copy link
Contributor Author

/milestone v1.20

@k8s-ci-robot
Copy link
Contributor

@ruiwen-zhao: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

In response to this:

/milestone v1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dchen1107
Copy link
Member

/milestone v1.20

@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Dec 1, 2020
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@hasheddan
Copy link
Contributor

/hold

This should be reviewed by @kubernetes/release-team-leads as we are currently in code / test freeze for v1.20.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2020
@ruiwen-zhao
Copy link
Contributor Author

/retest

@SergeyKanzhelev
Copy link
Member

/test pull-kubernetes-e2e-gce-ubuntu-containerd

1 similar comment
@ruiwen-zhao
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@ruiwen-zhao
Copy link
Contributor Author

/retest

@ruiwen-zhao
Copy link
Contributor Author

/retest

1 similar comment
@ruiwen-zhao
Copy link
Contributor Author

/retest

@liggitt liggitt changed the title Add AcceleratorStats to cri_stats_provider [1.20.1] Add AcceleratorStats to cri_stats_provider Dec 7, 2020
@liggitt
Copy link
Member

liggitt commented Dec 9, 2020

Can the hold be dropped now that master is open for 1.21?

@liggitt liggitt changed the title [1.20.1] Add AcceleratorStats to cri_stats_provider Add AcceleratorStats to cri_stats_provider Dec 9, 2020
@liggitt liggitt modified the milestones: v1.20, v1.21 Dec 9, 2020
@jeremyrickard
Copy link
Contributor

/hold cancel

Thanks for waiting while we got 1.20 out. Hold dropped @liggitt

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 9, 2020
@k8s-ci-robot k8s-ci-robot merged commit a20aeb8 into kubernetes:master Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants