Skip to content

Releases: runpod-workers/worker-vllm

0.2.3

10 Feb 04:14
2941db0
Compare
Choose a tag to compare

Worker vLLM 0.2.3 - What's Changed

Various bug fixes

New Contributors

0.2.2

31 Jan 05:35
Compare
Choose a tag to compare

Worker vLLM 0.2.2 - What's New

  • Custom Chat Templates: you may now specify a Jinja chat template with an environment variable.
  • Custom Tokenizer

Fixes:

  • Tensor Parallel/Multi-GPU Deployment
  • Baking Model into the image. Previously, the worker would download the model every time, ignoring the baked in model.
  • Crashes due to MAX_PARALLEL_LOADING_WORKERS

0.2.1

26 Jan 04:35
Compare
Choose a tag to compare

Worker vLLM 0.2.1 - What's New

  • Added OpenAI Chat Completions formatted output for non-streaming use. (previously only supported for streaming)

0.2.0

26 Jan 04:26
Compare
Choose a tag to compare

Worker vLLM 0.2.0 - What's New

  • You no longer need a linux-based machine or NVIDIA GPUs to build the worker.
  • Over 3x lighter Docker image size.
  • OpenAI Chat Completion output format (optional to use).
  • Fast image build time.
  • Docker Secrets-protected Hugging Face token support for building the image with a model baked in without exposing your token.
  • Support for n and best_of sampling parameters, which allow you to generate multiple responses from a single prompt.
  • New environment variables for various configuration.
  • vLLM Version: 0.2.7

0.1.0

17 Jan 00:51
ed48093
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/runpod-workers/worker-vllm/commits/0.1.0