-
Notifications
You must be signed in to change notification settings - Fork 792
Can't get nvidia-smi to work in a pod #4408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I managed to get it working, I also ran into this exact same issue, with a clean install of: There are a few things I had to do, firstly turns out I was missing the According to the Nvidia gpu-operator (which gets deployed when you run: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html
Because I wanted to use host drivers and runtime, I also had to update the containerd toml with the correct
|
@Tiaanjw I used your template of using apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
namespace: default
spec:
runtimeClassName: nvidia
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
|
@pishangujeniya you are right! thanks, I have updated my solution! |
@abstract-entity Did the proposed solution work? If so, can you close this issue? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Summary
Hello,
I'm trying to run microk8s with GPU operator on a Dell G7-7790 laptop with a RTX260 running a fresh install of Ubuntu 22.04. I'm unable to access my GPU, and i don't see any error
I've try many reinstall of ubuntu / nvidia drivers, containers, cuda / microk8s (with operator driver and auto) / gpu operator without success.
What Should Happen Instead?
I expect to get nvidia-smi information into the pod, maybe i'm missing something
Reproduction Steps
I've install a fresh ubuntu 22 with nvidia driver 545, nvidia containerand nvidia cuda.
Then i've installed micro k8s following this guide
After i install gpu operator:
I try to nvidia-smi with this pod:
And get this result
The path in the pod is:
Directory in the path don't exist in the pod
State of my GPU operator pods:
When i do nvidia-smi on host i get this information:
Introspection Report
Can you suggest a fix?
nop
Are you interested in contributing with a fix?
ask me anything, i'll be glad to help
The text was updated successfully, but these errors were encountered: