Skip to content

Cuda memory will not be released, and each new request will request memory to load the model, resulting in OOM #243

Open
@DongZhaoXiong

Description

@DongZhaoXiong

Describe the bug
I use the docker images: laiyer/llm-guard-api:latest-cuda provided by offical. When send a scan/prompt request, the first request succeeds, and all subsequent requests will fail.

To Reproduce
docker run -it --gpus=all --rm -p 8000:8000 \ -e APP_WORKERS=4 \ -e AUTH_TOKEN=xxxxx \ -e LOG_LEVEL='DEBUG' \ -v ./entrypoint.sh:/home/user/app/entrypoint.sh \ -v ./config/scanners.yml:/home/user/app/config/scanners.yml \ -v ./guard/models:/home/user/app/models \ laiyer/llm-guard-api:latest-cuda

Expected behavior
Load the required model when the container starts and wait for the request to be processed

Error
INFO: 223.70.229.57:62151 - "POST /scan/prompt HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/user/.local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/user/.local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__ return await self.app(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/home/user/.local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/home/user/.local/lib/python3.10/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 731, in __call__ await self.app(scope, otel_receive, otel_send) File "/home/user/.local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__ await self.app(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/user/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 754, in __call__ await self.middleware_stack(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 774, in app await route.handle(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 295, in handle await self.app(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/user/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/user/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/user/.local/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await f(request) File "/home/user/.local/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/home/user/.local/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/home/user/app/app/app.py", line 435, in submit_scan_prompt scanner_name, risk_score = result TypeError: cannot unpack non-iterable torch.OutOfMemoryError object

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions