New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port deviceManager to windows container manager to enable GPU access #93285
Port deviceManager to windows container manager to enable GPU access #93285
Conversation
Hi @aarnaud. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@aarnaud: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
2682613
to
a8419d1
Compare
/ok-to-test |
/approve |
a8419d1
to
ae35620
Compare
/test pull-kubernetes-typecheck |
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
I add again the constant file for windows to avoid this crash, and test it on windows server 2019 and all test passed. |
/assign @derekwaynecarr |
@aarnaud: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
// DevicePluginPath is the folder the Device Plugin is expecting sockets to be on | ||
// Only privileged pods have access to this path | ||
// Note: Placeholder until we find a "standard path" | ||
DevicePluginPath = "c:\\var\\lib\\kubelet\\device-plugins\\" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OS-specific constants are pretty unusual ... it means code that references this constant but is not running on the target OS will get the wrong value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand but how can I change that. by using goruntime.GOOS != "windows"
in
if socketPath == "" || !filepath.IsAbs(socketPath) { |
if filepath.IsAbs(criSocket) && goruntime.GOOS != "windows" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like DevicePluginPath is only referenced by KubeletSocket, and KubeletSocket is only referenced by NewManagerImpl in-tree. If we needed a different default path per-platform, something like this would be clearer to me:
diff --git a/pkg/kubelet/cm/devicemanager/manager.go b/pkg/kubelet/cm/devicemanager/manager.go
index 5d1925f9458..14dc3be1347 100644
--- a/pkg/kubelet/cm/devicemanager/manager.go
+++ b/pkg/kubelet/cm/devicemanager/manager.go
@@ -22,6 +22,7 @@ import (
"net"
"os"
"path/filepath"
+ "runtime"
"sort"
"sync"
"time"
@@ -125,7 +126,11 @@ func (s *sourcesReadyStub) AllReady() bool { return true }
// NewManagerImpl creates a new manager.
func NewManagerImpl(numaNodeInfo cputopology.NUMANodeInfo, topologyAffinityStore topologymanager.Store) (*ManagerImpl, error) {
- return newManagerImpl(pluginapi.KubeletSocket, numaNodeInfo, topologyAffinityStore)
+ socketPath := pluginapi.KubeletSocket
+ if runtime.GOOS == "windows" {
+ socketPath = pluginapi.KubeletSocketWindows
+ }
+ return newManagerImpl(socketPath, numaNodeInfo, topologyAffinityStore)
}
func newManagerImpl(socketPath string, numaNodeInfo cputopology.NUMANodeInfo, topologyAffinityStore topologymanager.Store) (*ManagerImpl, error) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liggitt, I changed my PR to add your feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OS-specific constants are pretty unusual ... it means code that references this constant but is not running on the target OS will get the wrong value
My understanding here was: only the builds targeting Windows will reference the constants in constants_windows.go
while Linux builds will reference the corresponding constants in constants_linux.go
/ constants_unix.go
based on: https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/constants/constants_windows.go and https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/apis/kubeadm/v1beta1/defaults_windows.go as well as in Go libraries: https://github.com/golang/go/blob/master/src/os/path_windows.go#L8
@liggitt can you explain a bit why/how this does not work especially in the context of something node-specific like kubelet. I assume the compiler will fail the build if a targeted platform does not have all the constants defined that the platform agnostic code is referring to. Is the concern that a controller may potentially use these incorrectly across a cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OS-specific constants are pretty unusual ... it means code that references this constant but is not running on the target OS will get the wrong value
@ddebroy I think @liggitt's concern here is roughly that because we use a single name across platforms for this constant, it may not be obvious when we wind up with code that depends on properties of a specific platform's constant value.
As opposed to having distinct constants for each platform.
If we already have platform specific implementations referencing the constant, why not reference a platform specific constant instead ...?
it looks like DevicePluginPath is only referenced by KubeletSocket, and KubeletSocket is only referenced by NewManagerImpl in-tree. If we needed a different default path per-platform, something like this would be clearer to me:
This comment outlines this before showing a specific proposed implementation ...
Hi @derekwaynecarr, Can you approve this PR for allowing GPU access under Windows? This PR is the same as #80917, which was already fully approved and merged, except for the following change: |
/hold cancel |
/approve |
/assign @derekwaynecarr |
No problem! Everything is fine. Was more meant as a light-hearted comment 😀. PS: Does nobody out there know Futurama? What a miss! 😉 |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aarnaud, dchen1107, michmike, thomacos The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@dchen1107 thanks for approving! |
thanks to all. Merry Christmas |
Great work @aarnaud |
If this makes device plugins work on Windows (which is my understanding) at least with dockershim, that seems like a user-visible change that should be in the v1.21 changelog, but is not. Or this being deliberately sat-upon until #97739 is resolved so that non-deprecated container runtimes can support this? |
Looks like there was no release note added. We should probably add one but I don't know the process for updating after the fact |
I think @aarnaud can just edit the initial PR comment to add a release note. |
I just added ;-) |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Port the devicemanager to Windows node to allow device plugins.
An important device plugin (not part of this PR) is the directx device plugin for GPU access: https://github.com/aarnaud/k8s-directx-device-plugin
This is required for machine learning, high-performance computing and computer graphics under Windows using the GPU.
References:
#80917
#93263
#93262
#88554
Fix issue:
#88157
Does this PR introduce a user-facing change?: