Description
When the Driver version and library version on the host are lower than the library version included in the image, after ldconfig of libnvidia-container is executed during container creation, the symlinks of the library in the container will be linked to a new version of library in image. This causes corresponding libraries to become unavailable. For example, executing nvidia-smi
would result in an error: Failed to initialize NVML: Driver/library version mismatch
.
reproduce
Use a host which driver version is lower than 525.105.17.
$ docker pull nsblink/ubuntu:test_nvc
$ docker run --rm -e NVIDIA_VISIBLE_DEVICES=1 -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -it --entrypoint /bin/bash nsblink/ubuntu:test_nvc
$ nvidia-smi
$ cd /lib/x86_64-linux-gnu; ls -lah | grep libnvidia-ml
root@da0fd684b11a:/lib/x86_64-linux-gnu# ls -lah | grep libnvidia-ml
lrwxrwxrwx 1 root root 26 Jul 29 07:59 libnvidia-ml.so.1 -> libnvidia-ml.so.525.105.17
-rw-r--r-- 1 root root 1.8M May 12 2022 libnvidia-ml.so.470.129.06
-rw-r--r-- 1 root root 1.8M Jul 25 08:32 libnvidia-ml.so.525.105.17
patch
Here I provide a patch !225 to solve this problem by recreating symlinks for libraries related to driver versions after ldconfig execution.
Migrated from https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/issues/3