Description
Describe the bug
I use the docker images: laiyer/llm-guard-api:latest-cuda provided by offical. When send a scan/prompt request, the first request succeeds, and all subsequent requests will fail.
To Reproduce
docker run -it --gpus=all --rm -p 8000:8000 \ -e APP_WORKERS=4 \ -e AUTH_TOKEN=xxxxx \ -e LOG_LEVEL='DEBUG' \ -v ./entrypoint.sh:/home/user/app/entrypoint.sh \ -v ./config/scanners.yml:/home/user/app/config/scanners.yml \ -v ./guard/models:/home/user/app/models \ laiyer/llm-guard-api:latest-cuda
Expected behavior
Load the required model when the container starts and wait for the request to be processed
Error
INFO: 223.70.229.57:62151 - "POST /scan/prompt HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/user/.local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/user/.local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__ return await self.app(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/home/user/.local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/home/user/.local/lib/python3.10/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 731, in __call__ await self.app(scope, otel_receive, otel_send) File "/home/user/.local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__ await self.app(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/user/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 754, in __call__ await self.middleware_stack(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 774, in app await route.handle(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 295, in handle await self.app(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/user/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await f(request) File "/home/user/.local/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/home/user/.local/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/home/user/app/app/app.py", line 435, in submit_scan_prompt scanner_name, risk_score = result TypeError: cannot unpack non-iterable torch.OutOfMemoryError object