Open
Description
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Also, before reporting a new issue, please make sure that:
- You read carefully the documentation and frequently asked questions.
- You searched for a similar issue and this is not a duplicate of an existing one.
- This issue is not related to NGC, otherwise, please use the devtalk forums instead.
- You went through the troubleshooting steps.
1. Issue or feature description
I'm running a Debian VM on proxmox with gpu passthrough;
Everytime I start a container i receive an "out of memory" error message. I was trying it with a Quadro P400 and now changed to a RTX2060 but that eror persists.
2. Steps to reproduce the issue
Start a deepstack or a compreface container
3. Information to attach (optional if deemed irrelevant)
- Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
I0117 15:18:51.313599 1426 nvc.c:376] initializing library context (version=1.11.0, build=c8f267be0bac1c654d59ad4ea5df907141149977)
I0117 15:18:51.313634 1426 nvc.c:350] using root /
I0117 15:18:51.313641 1426 nvc.c:351] using ldcache /etc/ld.so.cache
I0117 15:18:51.313646 1426 nvc.c:352] using unprivileged user 1000:1000
I0117 15:18:51.313662 1426 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0117 15:18:51.313721 1426 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W0117 15:18:51.314926 1427 nvc.c:273] failed to set inheritable capabilities
W0117 15:18:51.314958 1427 nvc.c:274] skipping kernel modules load due to failure
I0117 15:18:51.315149 1428 rpc.c:71] starting driver rpc service
I0117 15:18:51.318126 1429 rpc.c:71] starting nvcgo rpc service
I0117 15:18:51.318885 1426 nvc_info.c:766] requesting driver information with ''
I0117 15:18:51.320030 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.525.78.01
I0117 15:18:51.320122 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.525.78.01
I0117 15:18:51.320164 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.525.78.01
I0117 15:18:51.320190 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.525.78.01
I0117 15:18:51.320216 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.525.78.01
I0117 15:18:51.320250 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.525.78.01
I0117 15:18:51.320282 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.525.78.01
I0117 15:18:51.320308 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.525.78.01
I0117 15:18:51.320334 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.78.01
I0117 15:18:51.320372 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.525.78.01
I0117 15:18:51.320400 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.525.78.01
I0117 15:18:51.320430 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.525.78.01
I0117 15:18:51.320457 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.525.78.01
I0117 15:18:51.320480 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.525.78.01
I0117 15:18:51.320503 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.525.78.01
I0117 15:18:51.320521 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.525.78.01
I0117 15:18:51.320537 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.525.78.01
I0117 15:18:51.320562 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.525.78.01
I0117 15:18:51.320605 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.525.78.01
I0117 15:18:51.320720 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libcudadebugger.so.525.78.01
I0117 15:18:51.320770 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.525.78.01
I0117 15:18:51.320831 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.525.78.01
I0117 15:18:51.320864 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.525.78.01
I0117 15:18:51.320893 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.525.78.01
I0117 15:18:51.320922 1426 nvc_info.c:173] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.525.78.01
W0117 15:18:51.320940 1426 nvc_info.c:399] missing library libnvidia-nscq.so
W0117 15:18:51.320947 1426 nvc_info.c:399] missing library libnvidia-fatbinaryloader.so
W0117 15:18:51.320951 1426 nvc_info.c:399] missing library libnvidia-pkcs11.so
W0117 15:18:51.320955 1426 nvc_info.c:399] missing library libnvidia-ifr.so
W0117 15:18:51.320958 1426 nvc_info.c:399] missing library libnvidia-cbl.so
W0117 15:18:51.320963 1426 nvc_info.c:403] missing compat32 library libnvidia-ml.so
W0117 15:18:51.320966 1426 nvc_info.c:403] missing compat32 library libnvidia-cfg.so
W0117 15:18:51.320974 1426 nvc_info.c:403] missing compat32 library libnvidia-nscq.so
W0117 15:18:51.320977 1426 nvc_info.c:403] missing compat32 library libcuda.so
W0117 15:18:51.320980 1426 nvc_info.c:403] missing compat32 library libcudadebugger.so
W0117 15:18:51.320984 1426 nvc_info.c:403] missing compat32 library libnvidia-opencl.so
W0117 15:18:51.320989 1426 nvc_info.c:403] missing compat32 library libnvidia-ptxjitcompiler.so
W0117 15:18:51.320992 1426 nvc_info.c:403] missing compat32 library libnvidia-fatbinaryloader.so
W0117 15:18:51.320995 1426 nvc_info.c:403] missing compat32 library libnvidia-allocator.so
W0117 15:18:51.320999 1426 nvc_info.c:403] missing compat32 library libnvidia-compiler.so
W0117 15:18:51.321003 1426 nvc_info.c:403] missing compat32 library libnvidia-pkcs11.so
W0117 15:18:51.321010 1426 nvc_info.c:403] missing compat32 library libnvidia-ngx.so
W0117 15:18:51.321014 1426 nvc_info.c:403] missing compat32 library libvdpau_nvidia.so
W0117 15:18:51.321018 1426 nvc_info.c:403] missing compat32 library libnvidia-encode.so
W0117 15:18:51.321022 1426 nvc_info.c:403] missing compat32 library libnvidia-opticalflow.so
W0117 15:18:51.321027 1426 nvc_info.c:403] missing compat32 library libnvcuvid.so
W0117 15:18:51.321033 1426 nvc_info.c:403] missing compat32 library libnvidia-eglcore.so
W0117 15:18:51.321037 1426 nvc_info.c:403] missing compat32 library libnvidia-glcore.so
W0117 15:18:51.321044 1426 nvc_info.c:403] missing compat32 library libnvidia-tls.so
W0117 15:18:51.321046 1426 nvc_info.c:403] missing compat32 library libnvidia-glsi.so
W0117 15:18:51.321055 1426 nvc_info.c:403] missing compat32 library libnvidia-fbc.so
W0117 15:18:51.321058 1426 nvc_info.c:403] missing compat32 library libnvidia-ifr.so
W0117 15:18:51.321062 1426 nvc_info.c:403] missing compat32 library libnvidia-rtcore.so
W0117 15:18:51.321067 1426 nvc_info.c:403] missing compat32 library libnvoptix.so
W0117 15:18:51.321068 1426 nvc_info.c:403] missing compat32 library libGLX_nvidia.so
W0117 15:18:51.321077 1426 nvc_info.c:403] missing compat32 library libEGL_nvidia.so
W0117 15:18:51.321083 1426 nvc_info.c:403] missing compat32 library libGLESv2_nvidia.so
W0117 15:18:51.321089 1426 nvc_info.c:403] missing compat32 library libGLESv1_CM_nvidia.so
W0117 15:18:51.321093 1426 nvc_info.c:403] missing compat32 library libnvidia-glvkspirv.so
W0117 15:18:51.321098 1426 nvc_info.c:403] missing compat32 library libnvidia-cbl.so
I0117 15:18:51.321234 1426 nvc_info.c:299] selecting /usr/bin/nvidia-smi
I0117 15:18:51.321254 1426 nvc_info.c:299] selecting /usr/bin/nvidia-debugdump
I0117 15:18:51.321273 1426 nvc_info.c:299] selecting /usr/bin/nvidia-persistenced
I0117 15:18:51.321297 1426 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-control
I0117 15:18:51.321315 1426 nvc_info.c:299] selecting /usr/bin/nvidia-cuda-mps-server
W0117 15:18:51.324656 1426 nvc_info.c:425] missing binary nv-fabricmanager
W0117 15:18:51.324749 1426 nvc_info.c:349] missing firmware path /lib/firmware/nvidia/525.78.01/gsp.bin
I0117 15:18:51.324801 1426 nvc_info.c:529] listing device /dev/nvidiactl
I0117 15:18:51.324807 1426 nvc_info.c:529] listing device /dev/nvidia-uvm
I0117 15:18:51.324810 1426 nvc_info.c:529] listing device /dev/nvidia-uvm-tools
I0117 15:18:51.324813 1426 nvc_info.c:529] listing device /dev/nvidia-modeset
W0117 15:18:51.324841 1426 nvc_info.c:349] missing ipc path /var/run/nvidia-persistenced/socket
W0117 15:18:51.324862 1426 nvc_info.c:349] missing ipc path /var/run/nvidia-fabricmanager/socket
W0117 15:18:51.324879 1426 nvc_info.c:349] missing ipc path /tmp/nvidia-mps
I0117 15:18:51.324886 1426 nvc_info.c:822] requesting device information with ''
I0117 15:18:51.331250 1426 nvc_info.c:713] listing device /dev/nvidia0 (GPU-d0b2b12b-a223-ad5a-6661-e65ffde8f84b at 00000000:00:10.0)
NVRM version: 525.78.01
CUDA version: 12.0
Device Index: 0
Device Minor: 0
Model: NVIDIA GeForce RTX 2060
Brand: GeForce
GPU UUID: GPU-d0b2b12b-a223-ad5a-6661-e65ffde8f84b
Bus Location: 00000000:00:10.0
Architecture: 7.5
I0117 15:18:51.331286 1426 nvc.c:434] shutting down library context
I0117 15:18:51.331341 1429 rpc.c:95] terminating nvcgo rpc service
I0117 15:18:51.331804 1426 rpc.c:135] nvcgo rpc service terminated successfully
I0117 15:18:51.333713 1428 rpc.c:95] terminating driver rpc service
I0117 15:18:51.333969 1426 rpc.c:135] driver rpc service terminated successfully
- Kernel version from
uname -a
Linux debian 5.10.0-20-amd64 NVIDIA/nvidia-docker#1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux```
- [ ] Any relevant kernel output lines from `dmesg`
- [ ] Driver information from `nvidia-smi -a`
```Timestamp : Tue Jan 17 12:21:08 2023
Driver Version : 525.78.01
CUDA Version : 12.0
Attached GPUs : 1
GPU 00000000:00:10.0
Product Name : NVIDIA GeForce RTX 2060
Product Brand : GeForce
Product Architecture : Turing
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-d0b2b12b-a223-ad5a-6661-e65ffde8f84b
Minor Number : 0
VBIOS Version : 90.04.63.40.55
MultiGPU Board : No
Board ID : 0x10
Board Part Number : N/A
GPU Part Number : 1E89-150-A1
Module ID : 1
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : Pass-Through
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x00
Device : 0x10
Domain : 0x0000
Device Id : 0x1E8910DE
Bus Id : 00000000:00:10.0
Sub System Id : 0x20683842
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Device Current : 1
Device Max : 3
Host Max : N/A
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : 0 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 6144 MiB
Reserved : 217 MiB
Used : 0 MiB
Free : 5926 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 37 C
GPU Shutdown Temp : 93 C
GPU Slowdown Temp : 90 C
GPU Max Operating Temp : 88 C
GPU Target Temperature : 83 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 9.02 W
Power Limit : 170.00 W
Default Power Limit : 170.00 W
Enforced Power Limit : 170.00 W
Min Power Limit : 125.00 W
Max Power Limit : 170.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 2145 MHz
SM : 2145 MHz
Memory : 7001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
Processes : None
- Docker version from
docker version
Client: Docker Engine - Community
Version: 20.10.22
API version: 1.41
Go version: go1.18.9
Git commit: 3a2c30b
Built: Thu Dec 15 22:28:22 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.22
API version: 1.41 (minimum version 1.12)
Go version: go1.18.9
Git commit: 42c8b31
Built: Thu Dec 15 22:26:14 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.15
GitCommit: 5b842e528e99d4d4c1686467debf2bd4b88ecd86
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
- NVIDIA packages version from
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Nome Versão Arquitectura Descrição
+++-=============================-============-============-=====================================================
ii libnvidia-container-tools 1.11.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.11.0-1 amd64 NVIDIA container runtime library
ii nvidia-container-runtime 3.11.0-1 all NVIDIA container runtime
un nvidia-container-runtime-hook <nenhuma> <nenhuma> (nenhuma descrição disponível)
ii nvidia-container-toolkit 1.11.0-1 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.11.0-1 amd64 NVIDIA Container Toolkit Base
un nvidia-docker <nenhuma> <nenhuma> (nenhuma descrição disponível)
ii nvidia-docker2 2.11.0-1 all nvidia-docker CLI wrapper```
- [ ] NVIDIA container library version from `nvidia-container-cli -V`
```cli-version: 1.11.0
lib-version: 1.11.0
build date: 2022-09-06T09:21+00:00
build revision: c8f267be0bac1c654d59ad4ea5df907141149977
build compiler: x86_64-linux-gnu-gcc-8 8.3.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections```
- [ ] NVIDIA container library logs (see [troubleshooting](https://github.com/NVIDIA/nvidia-docker/wiki/Troubleshooting))
```{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:38:16-03:00"}
{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:38:26-03:00"}
{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:38:26-03:00"}
{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:38:26-03:00"}
{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:38:32-03:00"}
{"level":"info","msg":"Using OCI specification file path: /run/containerd/io.containerd.runtime.v2.task/moby/d564955d89f09cdacbb0d1e731acc41f2e859d15ff83a6d52257b7ef58212b86/config.json",>{"level":"info","msg":"Auto-detected mode as 'legacy'","time":"2023-01-17T12:38:32-03:00"}
{"level":"info","msg":"Using prestart hook path: /usr/bin/nvidia-container-runtime-hook","time":"2023-01-17T12:38:32-03:00"}
{"level":"info","msg":"Applied required modification to OCI specification","time":"2023-01-17T12:38:32-03:00"}
{"level":"info","msg":"Forwarding command to runtime","time":"2023-01-17T12:38:32-03:00"}
{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:38:33-03:00"}
{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:38:59-03:00"}
{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:39:09-03:00"}
{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:39:09-03:00"}
{"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-01-17T12:39:09-03:00"}
- Docker command, image and tag used
Docker compose
version: '3.7'
services:
deepstack:
container_name: deepstack
restart: unless-stopped
image: deepquestai/deepstack:gpu
runtime: nvidia
ports:
- '5005:5000'
environment:
- VISION-FACE=True
# - VISION-DETECTION=True
# - MODE=High
#API-KEY= ''
volumes:
- /home/deepstack/models:/modelstore/detection
--
GNU nano 5.4
version: '3.7'
volumes:
double-take:
deepstack:
services:
double-take:
container_name: double-take
image: jakowenko/double-take:latest
restart: unless-stopped
volumes:
- /home/dtake/double-take:/.storage
ports:
- '3000:3000'
deepstack:
container_name: deepstack_dtake
restart: unless-stopped
image: deepquestai/deepstack:gpu
runtime: nvidia
ports:
- '5000:5000'
environment:
- VISION-FACE=True
# - VISION-DETECTION=True
- MODE=Low
#API-KEY= ''
volumes:
- /home/dtake/deepstack/models:/modelstore/detection
Metadata
Metadata
Assignees
Labels
No labels