Runtime stage for Dockerfile

Hello! I was just trying out llgtrt yesterday and was blown away by how easy it was it get LoRA adapters up and running. I was struggling for over a week to get Phi4 + LoRA to work with TensorRT-LLM + Triton Server and finally gave up. The performance of this server is phenomenal as well and I am _so_ happy not to have to mess with these gigantic config.pbtxt files that change every release too. Wow, I am so impressed with this project! Thank you!

One thing I noticed though is that the docker image is quite large - 35 GB as measured by [dive](https://github.com/wagoodman/dive). It includes a bunch of stuff like the Rust toolchain, a bunch of Python libs, etc. that are not required for runtime. Much of the fault lies with the NVIDIA tensorrt container, which is a total mess.

I was able to get the image size down to 7.7 GB (78% smaller) by having this as the final stage:

```dockerfile
FROM nvcr.io/nvidia/cuda:12.8.1-runtime-ubuntu24.04 AS llgtrt_prod

RUN DEBIAN_FRONTEND=noninteractive apt-get update \
    && apt-get upgrade -y \
    && apt-get install -y --no-install-recommends \
        # These are runtime dependencies of tensorrt_llm
        libpython3.12-dev \
        libopenmpi-dev \
    && rm -rf /var/lib/apt/lists/*

COPY --from=llgtrt_builder /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs /usr/local/lib
COPY --from=llgtrt_builder /usr/lib/x86_64-linux-gnu/libnvinfer.so.10 /usr/local/lib/libnvinfer.so.10
COPY --from=llgtrt_builder /workspaces/llgtrt/target/release/llgtrt /usr/local/bin/llgtrt
```

I haven't tested/don't know the full range of llgtrt capabilities, but Phi4 + five LoRA adapters works great with this as the final image.

I recommend producing two docker images, one for model building (that should include the lora export script, btw) and one for runtime. To avoid breaking changes, the runtime stage could be published as `llgtrt:<version>-runtime`.

Having a smaller image also helps with security (just surface area) and auto-scale time (doesn't take as long to pull the image).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Runtime stage for Dockerfile #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Runtime stage for Dockerfile #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions