New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated cherry pick of #93044: Initialize scheduler's podInformer in sharedInformerFactory #105015
Automated cherry pick of #93044: Initialize scheduler's podInformer in sharedInformerFactory #105015
Conversation
Scheduler's specific podInfomer is now initialized inside the sahredInformerFactory.
@alculquicondor: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-sig apps |
/assign @liggitt /priority critical-urgent |
/retest |
@@ -311,9 +306,6 @@ func initPolicyFromConfigMap(client clientset.Interface, policyRef *schedulerapi | |||
|
|||
// Run begins watching and scheduling. It waits for cache to be synced, then starts scheduling and blocked until the context is done. | |||
func (sched *Scheduler) Run(ctx context.Context) { | |||
if !cache.WaitForCacheSync(ctx.Done(), sched.scheduledPodsHasSynced) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we block for shared informers to sync before this point, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's in cmd/kube-scheduler/app/server.go#Run
/lgtm |
/kind bug |
does this code now match the state of 1.20, or were there follow up changes/fixes? ideally, this would take advantage of the signal we have in 1.20 (especially since this is the last 1.19 release), and #93044 (comment) sounds like #93566 also touched what shipped in 1.20 for this |
FYI @kubernetes/release-engineering /hold until discussion is resolved |
I agree with @Huang-Wei that #93566 was just supposed to be a refactoring. But we are safer to cherry-picking the change too as the intermediate state hasn't made it to any release and thus has no production signal. |
I read that #93566's key idea is to use a more well-encapsulated But I'm not going to oppose cherry-picking #93566 if we're more comfortable with containing it - it won't have side effects. |
/retest |
1 similar comment
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I believe this should be merged in this patch cycle (1.19.15). |
I agree. This is a regression from 1.18 that was accidentally resolved by this change that merged to 1.20. The regression needs to be resolved and this change has had soak time in 1.20 since 1.20.0 |
Approving this post-deadline cherry pick based on the comments from @alculquicondor and @liggitt:
|
/hold cancel |
Cherry pick of #93044 and #93566 on release-1.19.
#93044: Initialize scheduler's podInformer in sharedInformerFactory
#93566: Optimize the use of informer for scheduler
/sig scheduling
/kind bug
Because of the use of 2 separate informers, the scheduler might try to schedule a Pod, fail and never retry, because it thinks that the Pod doesn't exist. Reality was that the Pod was not yet added to the 2nd informer.
This is solved by using a single informer. This fix was originally introduced in 1.20 as a memory optimization, but it turns out to guarantee correct behavior.
#93566 is supposed to just be a refactoring. However, since we don't this would be the last 1.19 release, it's better to include it, as we have signal that both changes work well in production.
Fixes #104521