Skip to content

Runtime stage for Dockerfile #28

Open
@mtaron

Description

@mtaron

Hello! I was just trying out llgtrt yesterday and was blown away by how easy it was it get LoRA adapters up and running. I was struggling for over a week to get Phi4 + LoRA to work with TensorRT-LLM + Triton Server and finally gave up. The performance of this server is phenomenal as well and I am so happy not to have to mess with these gigantic config.pbtxt files that change every release too. Wow, I am so impressed with this project! Thank you!

One thing I noticed though is that the docker image is quite large - 35 GB as measured by dive. It includes a bunch of stuff like the Rust toolchain, a bunch of Python libs, etc. that are not required for runtime. Much of the fault lies with the NVIDIA tensorrt container, which is a total mess.

I was able to get the image size down to 7.7 GB (78% smaller) by having this as the final stage:

FROM nvcr.io/nvidia/cuda:12.8.1-runtime-ubuntu24.04 AS llgtrt_prod

RUN DEBIAN_FRONTEND=noninteractive apt-get update \
    && apt-get upgrade -y \
    && apt-get install -y --no-install-recommends \
        # These are runtime dependencies of tensorrt_llm
        libpython3.12-dev \
        libopenmpi-dev \
    && rm -rf /var/lib/apt/lists/*

COPY --from=llgtrt_builder /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs /usr/local/lib
COPY --from=llgtrt_builder /usr/lib/x86_64-linux-gnu/libnvinfer.so.10 /usr/local/lib/libnvinfer.so.10
COPY --from=llgtrt_builder /workspaces/llgtrt/target/release/llgtrt /usr/local/bin/llgtrt

I haven't tested/don't know the full range of llgtrt capabilities, but Phi4 + five LoRA adapters works great with this as the final image.

I recommend producing two docker images, one for model building (that should include the lora export script, btw) and one for runtime. To avoid breaking changes, the runtime stage could be published as llgtrt:<version>-runtime.

Having a smaller image also helps with security (just surface area) and auto-scale time (doesn't take as long to pull the image).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions