New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexed job implementation #98812
Indexed job implementation #98812
Conversation
83f7f75
to
4d9f565
Compare
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
pkg/apis/batch/types.go
Outdated
// The Job is considered complete when there is one successfully completed Pod | ||
// for each index. | ||
// When value is `Indexed`, .spec.completions must be specified and | ||
// `.spec.parallelism` must be less than or equal to 10^5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps to explain why there's a limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a design decision. I don't think it's good to make it part of user documentation (user docs are build from this file).
pkg/apis/batch/types.go
Outdated
// Job get an associated completion index from 0 to (.spec.completions - 1), | ||
// available in the annotation batch.alpha.kubernetes.io/job-completion-index. | ||
// The Job is considered complete when there is one successfully completed Pod | ||
// for each index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible for a pod to successfully complete more than once? if so, should job authors design their pods around this (idempotency)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a separate PR for API changes #98441
Idempotency requirement is already documented for regular Jobs.
pkg/apis/batch/types.go
Outdated
// This field is alpha-level and is only honored by servers that enable the | ||
// IndexedJob feature gate. More completion modes can be added in the future. | ||
// If the Job controller observes a mode that it doesn't recognize, the | ||
// controller skips updates for the Job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, not sure if silently failing is the right thing to do.
edit: the validation code seems to raise an error? please update the doc to reflect that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about the controller, not the apiserver. The apiserver would reject the unknown value.
This is a safeguard to add more completion modes in the future. The idea is that if the controller is one version behind during an upgrade, it has some way of handling a value it doesn't know.
phase v1.PodPhase | ||
} | ||
|
||
func hasFailingPods(status []indexPhase) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in job_controller_test.go
moved
func capIndexesList(indexes string, softLimit int) string { | ||
ix := softLimit | ||
for ; ix < len(indexes); ix++ { | ||
if indexes[ix] == ',' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will panic if len(indexes) < softLimit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 2192 doesn't let that happen
if ix >= len(indexes) { | ||
return indexes | ||
} | ||
return indexes[:ix+1] + "..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we expect indexes to be completed in order? i.e. is index N likely to complete before index N+K for large K?
if so, since we're cutting the string off anyway, the last 50 chars might be more informational to know how much is complete (although I'll concede that it'll be a little weird to read at first)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if parallelism is smaller than completions, yes, smaller indexes will complete earlier.
What should happen is that the first interval should contain most of the values, like 0-123456
. The rest of the values would be more sporadic. I think the first interval actually gives you the most information: all indexes up to 123456 have completed, and there are some other completions here and there.
/retest |
c81a31e
to
6aed72a
Compare
/retest |
1 similar comment
/retest |
/assign just so I can see it on my backlog |
6aed72a
to
690b2a5
Compare
90f2524
to
e850ccb
Compare
/retest |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/triage accepted |
/priority backlog |
ed2821c
to
282646c
Compare
/retest |
282646c
to
74a235b
Compare
74a235b
to
08338cb
Compare
When .spec.completionMode="Indexed"
08338cb
to
8812531
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
What type of PR is this?
/kind feature
What this PR does / why we need it:
/sig apps
/area batch
/area workload-api/job
Implements support for Indexed Job.
For an indexed Job, the job controller creates a Pod with an associated index from 0 to (.spec.completions-1). The index is added as an annotation.
Other implementation details:
Which issue(s) this PR fixes:
Fixes #97169 #14188
Ref kubernetes/enhancements#2214
Special notes for your reviewer:
Builds on top of #98441
Integration test to follow
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: