diff --git a/README.md b/README.md index 231e4e500b..cbf3e6044a 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,7 @@ potential of cutting-edge AI models. - Support SGLang backend: [#1161](https://github.com/xorbitsai/inference/pull/1161) - Support LoRA for LLM and image models: [#1080](https://github.com/xorbitsai/inference/pull/1080) ### New Models +- Built-in support for [Deepseek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528): [#3539](https://github.com/xorbitsai/inference/pull/3539) - Built-in support for [Qwen3](https://qwenlm.github.io/blog/qwen3/): [#3347](https://github.com/xorbitsai/inference/pull/3347) - Built-in support for [Qwen2.5-Omni](https://github.com/QwenLM/Qwen2.5-Omni): [#3279](https://github.com/xorbitsai/inference/pull/3279) - Built-in support for [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): [#3274](https://github.com/xorbitsai/inference/pull/3274) @@ -54,7 +55,6 @@ potential of cutting-edge AI models. - Built-in support for [SeaLLMs-v3](https://github.com/DAMO-NLP-SG/DAMO-SeaLLMs): [#3248](https://github.com/xorbitsai/inference/pull/3248) - Built-in support for [paraformer-zh](https://huggingface.co/funasr/paraformer-zh): [#3236](https://github.com/xorbitsai/inference/pull/3236) - Built-in support for [InternVL3](https://internvl.github.io/blog/2025-04-11-InternVL-3.0/): [#3235](https://github.com/xorbitsai/inference/pull/3235) -- Built-in support for [MegaTTS3](https://github.com/bytedance/MegaTTS3): [#3224](https://github.com/xorbitsai/inference/pull/3224) ### Integrations - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable. - [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization. diff --git a/README_zh_CN.md b/README_zh_CN.md index cb969a7ac9..02c6f98475 100644 --- a/README_zh_CN.md +++ b/README_zh_CN.md @@ -43,6 +43,7 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布 - 支持 SGLang 后端: [#1161](https://github.com/xorbitsai/inference/pull/1161) - 支持LLM和图像模型的LoRA: [#1080](https://github.com/xorbitsai/inference/pull/1080) ### 新模型 +- 内置 [Deepseek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528): [#3539](https://github.com/xorbitsai/inference/pull/3539) - 内置 [Qwen3](https://qwenlm.github.io/blog/qwen3/): [#3347](https://github.com/xorbitsai/inference/pull/3347) - 内置 [Qwen2.5-Omni](https://github.com/QwenLM/Qwen2.5-Omni): [#3279](https://github.com/xorbitsai/inference/pull/3279) - 内置 [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): [#3274](https://github.com/xorbitsai/inference/pull/3274) @@ -50,7 +51,6 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布 - 内置 [SeaLLMs-v3](https://github.com/DAMO-NLP-SG/DAMO-SeaLLMs): [#3248](https://github.com/xorbitsai/inference/pull/3248) - 内置 [paraformer-zh](https://huggingface.co/funasr/paraformer-zh): [#3236](https://github.com/xorbitsai/inference/pull/3236) - 内置 [InternVL3](https://internvl.github.io/blog/2025-04-11-InternVL-3.0/): [#3235](https://github.com/xorbitsai/inference/pull/3235) -- 内置 [MegaTTS3](https://github.com/bytedance/MegaTTS3): [#3224](https://github.com/xorbitsai/inference/pull/3224) ### 集成 - [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/):一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力,帮助您轻松实现复杂的问答场景。 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。 diff --git a/doc/source/getting_started/installation.rst b/doc/source/getting_started/installation.rst index da83a31bdf..0c3455fb47 100644 --- a/doc/source/getting_started/installation.rst +++ b/doc/source/getting_started/installation.rst @@ -60,7 +60,7 @@ Currently, supported models include: - ``codestral-v0.1`` - ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k`` - ``code-llama``, ``code-llama-python``, ``code-llama-instruct`` -- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-r1``, ``deepseek-r1-distill-llama`` +- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-prover-v2``, ``deepseek-r1-distill-llama`` - ``yi-coder``, ``yi-coder-chat`` - ``codeqwen1.5``, ``codeqwen1.5-chat`` - ``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-instruct``, ``qwen2.5-coder-instruct``, ``qwen2.5-instruct-1m`` @@ -74,11 +74,14 @@ Currently, supported models include: - ``codegeex4`` - ``qwen1.5-chat``, ``qwen1.5-moe-chat`` - ``qwen2-instruct``, ``qwen2-moe-instruct`` +- ``XiYanSQL-QwenCoder-2504`` - ``QwQ-32B-Preview``, ``QwQ-32B`` - ``marco-o1`` - ``fin-r1`` - ``seallms-v3`` -- ``skywork-or1-preview`` +- ``skywork-or1-preview``, ``skywork-or1`` +- ``HuatuoGPT-o1-Qwen2.5``, ``HuatuoGPT-o1-LLaMA-3.1`` +- ``DianJin-R1`` - ``gemma-it``, ``gemma-2-it``, ``gemma-3-1b-it`` - ``orion-chat``, ``orion-chat-rag`` - ``c4ai-command-r-v01`` diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po index 6cb76f6311..abf06d6d2e 100644 --- a/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po +++ b/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po @@ -21,7 +21,7 @@ msgstr "" #: ../../source/models/model_abilities/audio.rst:5 msgid "Audio" -msgstr "" +msgstr "音频" #: ../../source/models/model_abilities/audio.rst:7 msgid "Learn how to turn audio into text or text into audio with Xinference." @@ -358,7 +358,7 @@ msgstr "基本使用,加载模型 ``CosyVoice-300M-SFT``。" msgid "" "Please note that the latest CosyVoice 2.0 requires `use_flow_cache=True` " "for stream generation." -msgstr "" +msgstr "请注意,最新版本的 CosyVoice 2.0 在进行流式生成时需要设置 `use_flow_cache=True`。" #: ../../source/models/model_abilities/audio.rst:422 msgid "" diff --git a/doc/source/models/builtin/audio/index.rst b/doc/source/models/builtin/audio/index.rst index cc1d7ebad4..ceefae0dc2 100644 --- a/doc/source/models/builtin/audio/index.rst +++ b/doc/source/models/builtin/audio/index.rst @@ -55,6 +55,12 @@ The following is a list of built-in audio models in Xinference: paraformer-zh + paraformer-zh-hotword + + paraformer-zh-long + + paraformer-zh-spk + sensevoicesmall whisper-base diff --git a/doc/source/models/builtin/audio/paraformer-zh-hotword.rst b/doc/source/models/builtin/audio/paraformer-zh-hotword.rst new file mode 100644 index 0000000000..a15668bbee --- /dev/null +++ b/doc/source/models/builtin/audio/paraformer-zh-hotword.rst @@ -0,0 +1,19 @@ +.. _models_builtin_paraformer-zh-hotword: + +===================== +paraformer-zh-hotword +===================== + +- **Model Name:** paraformer-zh-hotword +- **Model Family:** funasr +- **Abilities:** ['audio2text'] +- **Multilingual:** False + +Specifications +^^^^^^^^^^^^^^ + +- **Model ID:** JunHowie/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404 + +Execute the following command to launch the model:: + + xinference launch --model-name paraformer-zh-hotword --model-type audio \ No newline at end of file diff --git a/doc/source/models/builtin/audio/paraformer-zh-long.rst b/doc/source/models/builtin/audio/paraformer-zh-long.rst new file mode 100644 index 0000000000..ec1d89969b --- /dev/null +++ b/doc/source/models/builtin/audio/paraformer-zh-long.rst @@ -0,0 +1,19 @@ +.. _models_builtin_paraformer-zh-long: + +================== +paraformer-zh-long +================== + +- **Model Name:** paraformer-zh-long +- **Model Family:** funasr +- **Abilities:** ['audio2text'] +- **Multilingual:** False + +Specifications +^^^^^^^^^^^^^^ + +- **Model ID:** JunHowie/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch + +Execute the following command to launch the model:: + + xinference launch --model-name paraformer-zh-long --model-type audio \ No newline at end of file diff --git a/doc/source/models/builtin/audio/paraformer-zh-spk.rst b/doc/source/models/builtin/audio/paraformer-zh-spk.rst new file mode 100644 index 0000000000..bec3917e3b --- /dev/null +++ b/doc/source/models/builtin/audio/paraformer-zh-spk.rst @@ -0,0 +1,19 @@ +.. _models_builtin_paraformer-zh-spk: + +================= +paraformer-zh-spk +================= + +- **Model Name:** paraformer-zh-spk +- **Model Family:** funasr +- **Abilities:** ['audio2text'] +- **Multilingual:** False + +Specifications +^^^^^^^^^^^^^^ + +- **Model ID:** JunHowie/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn + +Execute the following command to launch the model:: + + xinference launch --model-name paraformer-zh-spk --model-type audio \ No newline at end of file diff --git a/doc/source/models/builtin/llm/cogvlm2-video-llama3-chat.rst b/doc/source/models/builtin/llm/cogvlm2-video-llama3-chat.rst deleted file mode 100644 index dc80b20085..0000000000 --- a/doc/source/models/builtin/llm/cogvlm2-video-llama3-chat.rst +++ /dev/null @@ -1,31 +0,0 @@ -.. _models_llm_cogvlm2-video-llama3-chat: - -======================================== -cogvlm2-video-llama3-chat -======================================== - -- **Context Length:** 8192 -- **Model Name:** cogvlm2-video-llama3-chat -- **Languages:** en, zh -- **Abilities:** chat, vision -- **Description:** CogVLM2-Video achieves state-of-the-art performance on multiple video question answering tasks. - -Specifications -^^^^^^^^^^^^^^ - - -Model Spec 1 (pytorch, 12 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 12 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** THUDM/cogvlm2-video-llama3-chat -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name cogvlm2-video-llama3-chat --size-in-billions 12 --model-format pytorch --quantization ${quantization} - diff --git a/doc/source/models/builtin/llm/cogvlm2.rst b/doc/source/models/builtin/llm/cogvlm2.rst deleted file mode 100644 index ff10d229b6..0000000000 --- a/doc/source/models/builtin/llm/cogvlm2.rst +++ /dev/null @@ -1,47 +0,0 @@ -.. _models_llm_cogvlm2: - -======================================== -cogvlm2 -======================================== - -- **Context Length:** 8192 -- **Model Name:** cogvlm2 -- **Languages:** en, zh -- **Abilities:** chat, vision -- **Description:** CogVLM2 have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models. - -Specifications -^^^^^^^^^^^^^^ - - -Model Spec 1 (pytorch, 20 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 20 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** THUDM/cogvlm2-llama3-chinese-chat-19B -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name cogvlm2 --size-in-billions 20 --model-format pytorch --quantization ${quantization} - - -Model Spec 2 (pytorch, 20 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 20 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** THUDM/cogvlm2-llama3-chinese-chat-19B-{quantization} -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name cogvlm2 --size-in-billions 20 --model-format pytorch --quantization ${quantization} - diff --git a/doc/source/models/builtin/llm/deepseek-prover-v2.rst b/doc/source/models/builtin/llm/deepseek-prover-v2.rst new file mode 100644 index 0000000000..fb4a301ca4 --- /dev/null +++ b/doc/source/models/builtin/llm/deepseek-prover-v2.rst @@ -0,0 +1,63 @@ +.. _models_llm_deepseek-prover-v2: + +======================================== +deepseek-prover-v2 +======================================== + +- **Context Length:** 163840 +- **Model Name:** deepseek-prover-v2 +- **Languages:** en, zh +- **Abilities:** chat, reasoning +- **Description:** We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model + +Specifications +^^^^^^^^^^^^^^ + + +Model Spec 1 (pytorch, 671 Billion) +++++++++++++++++++++++++++++++++++++++++ + +- **Model Format:** pytorch +- **Model Size (in billions):** 671 +- **Quantizations:** none +- **Engines**: vLLM, Transformers +- **Model ID:** deepseek-ai/DeepSeek-Prover-V2-671B +- **Model Hubs**: `Hugging Face `__, `ModelScope `__ + +Execute the following command to launch the model, remember to replace ``${quantization}`` with your +chosen quantization method from the options listed above:: + + xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 671 --model-format pytorch --quantization ${quantization} + + +Model Spec 2 (pytorch, 7 Billion) +++++++++++++++++++++++++++++++++++++++++ + +- **Model Format:** pytorch +- **Model Size (in billions):** 7 +- **Quantizations:** none +- **Engines**: vLLM, Transformers +- **Model ID:** deepseek-ai/DeepSeek-Prover-V2-7B +- **Model Hubs**: `Hugging Face `__, `ModelScope `__ + +Execute the following command to launch the model, remember to replace ``${quantization}`` with your +chosen quantization method from the options listed above:: + + xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 7 --model-format pytorch --quantization ${quantization} + + +Model Spec 3 (mlx, 7 Billion) +++++++++++++++++++++++++++++++++++++++++ + +- **Model Format:** mlx +- **Model Size (in billions):** 7 +- **Quantizations:** 4bit +- **Engines**: +- **Model ID:** mlx-community/DeepSeek-Prover-V2-7B-4bit +- **Model Hubs**: `Hugging Face `__, `ModelScope `__ + +Execute the following command to launch the model, remember to replace ``${quantization}`` with your +chosen quantization method from the options listed above:: + + xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 7 --model-format mlx --quantization ${quantization} + diff --git a/doc/source/models/builtin/llm/deepseek-r1-0528.rst b/doc/source/models/builtin/llm/deepseek-r1-0528.rst new file mode 100644 index 0000000000..af1ee45c15 --- /dev/null +++ b/doc/source/models/builtin/llm/deepseek-r1-0528.rst @@ -0,0 +1,31 @@ +.. _models_llm_deepseek-r1-0528: + +======================================== +deepseek-r1-0528 +======================================== + +- **Context Length:** 163840 +- **Model Name:** deepseek-r1-0528 +- **Languages:** en, zh +- **Abilities:** chat, reasoning +- **Description:** DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. + +Specifications +^^^^^^^^^^^^^^ + + +Model Spec 1 (pytorch, 671 Billion) +++++++++++++++++++++++++++++++++++++++++ + +- **Model Format:** pytorch +- **Model Size (in billions):** 671 +- **Quantizations:** none +- **Engines**: vLLM, Transformers +- **Model ID:** deepseek-ai/DeepSeek-R1-0528 +- **Model Hubs**: `Hugging Face `__, `ModelScope `__ + +Execute the following command to launch the model, remember to replace ``${quantization}`` with your +chosen quantization method from the options listed above:: + + xinference launch --model-engine ${engine} --model-name deepseek-r1-0528 --size-in-billions 671 --model-format pytorch --quantization ${quantization} + diff --git a/doc/source/models/builtin/llm/deepseek-v2.rst b/doc/source/models/builtin/llm/deepseek-v2.rst deleted file mode 100644 index fb63b10b2e..0000000000 --- a/doc/source/models/builtin/llm/deepseek-v2.rst +++ /dev/null @@ -1,47 +0,0 @@ -.. _models_llm_deepseek-v2: - -======================================== -deepseek-v2 -======================================== - -- **Context Length:** 128000 -- **Model Name:** deepseek-v2 -- **Languages:** en, zh -- **Abilities:** generate -- **Description:** DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. - -Specifications -^^^^^^^^^^^^^^ - - -Model Spec 1 (pytorch, 16 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 16 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** deepseek-ai/DeepSeek-V2-Lite -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name deepseek-v2 --size-in-billions 16 --model-format pytorch --quantization ${quantization} - - -Model Spec 2 (pytorch, 236 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 236 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** deepseek-ai/DeepSeek-V2 -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name deepseek-v2 --size-in-billions 236 --model-format pytorch --quantization ${quantization} - diff --git a/doc/source/models/builtin/llm/deepseek-v3-0324.rst b/doc/source/models/builtin/llm/deepseek-v3-0324.rst new file mode 100644 index 0000000000..e6aecb76b1 --- /dev/null +++ b/doc/source/models/builtin/llm/deepseek-v3-0324.rst @@ -0,0 +1,47 @@ +.. _models_llm_deepseek-v3-0324: + +======================================== +deepseek-v3-0324 +======================================== + +- **Context Length:** 163840 +- **Model Name:** deepseek-v3-0324 +- **Languages:** en, zh +- **Abilities:** chat +- **Description:** DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. + +Specifications +^^^^^^^^^^^^^^ + + +Model Spec 1 (pytorch, 671 Billion) +++++++++++++++++++++++++++++++++++++++++ + +- **Model Format:** pytorch +- **Model Size (in billions):** 671 +- **Quantizations:** none +- **Engines**: vLLM, Transformers +- **Model ID:** deepseek-ai/DeepSeek-V3-0324 +- **Model Hubs**: `Hugging Face `__, `ModelScope `__ + +Execute the following command to launch the model, remember to replace ``${quantization}`` with your +chosen quantization method from the options listed above:: + + xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format pytorch --quantization ${quantization} + + +Model Spec 2 (awq, 671 Billion) +++++++++++++++++++++++++++++++++++++++++ + +- **Model Format:** awq +- **Model Size (in billions):** 671 +- **Quantizations:** Int4 +- **Engines**: vLLM, Transformers +- **Model ID:** cognitivecomputations/DeepSeek-V3-0324-AWQ +- **Model Hubs**: `Hugging Face `__, `ModelScope `__ + +Execute the following command to launch the model, remember to replace ``${quantization}`` with your +chosen quantization method from the options listed above:: + + xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format awq --quantization ${quantization} + diff --git a/doc/source/models/builtin/llm/deepseek-vl-chat.rst b/doc/source/models/builtin/llm/deepseek-vl-chat.rst deleted file mode 100644 index ae00c384eb..0000000000 --- a/doc/source/models/builtin/llm/deepseek-vl-chat.rst +++ /dev/null @@ -1,47 +0,0 @@ -.. _models_llm_deepseek-vl-chat: - -======================================== -deepseek-vl-chat -======================================== - -- **Context Length:** 4096 -- **Model Name:** deepseek-vl-chat -- **Languages:** en, zh -- **Abilities:** chat, vision -- **Description:** DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. - -Specifications -^^^^^^^^^^^^^^ - - -Model Spec 1 (pytorch, 1_3 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 1_3 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** deepseek-ai/deepseek-vl-1.3b-chat -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name deepseek-vl-chat --size-in-billions 1_3 --model-format pytorch --quantization ${quantization} - - -Model Spec 2 (pytorch, 7 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 7 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** deepseek-ai/deepseek-vl-7b-chat -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name deepseek-vl-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization} - diff --git a/doc/source/models/builtin/llm/glm-edge-v.rst b/doc/source/models/builtin/llm/glm-edge-v.rst deleted file mode 100644 index 6aef562071..0000000000 --- a/doc/source/models/builtin/llm/glm-edge-v.rst +++ /dev/null @@ -1,143 +0,0 @@ -.. _models_llm_glm-edge-v: - -======================================== -glm-edge-v -======================================== - -- **Context Length:** 8192 -- **Model Name:** glm-edge-v -- **Languages:** en, zh -- **Abilities:** chat, vision -- **Description:** The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs. - -Specifications -^^^^^^^^^^^^^^ - - -Model Spec 1 (pytorch, 2 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 2 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** THUDM/glm-edge-v-2b -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format pytorch --quantization ${quantization} - - -Model Spec 2 (pytorch, 5 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 5 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** THUDM/glm-edge-v-5b -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format pytorch --quantization ${quantization} - - -Model Spec 3 (ggufv2, 2 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** ggufv2 -- **Model Size (in billions):** 2 -- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0 -- **Engines**: llama.cpp -- **Model ID:** THUDM/glm-edge-v-2b-gguf -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format ggufv2 --quantization ${quantization} - - -Model Spec 4 (ggufv2, 2 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** ggufv2 -- **Model Size (in billions):** 2 -- **Quantizations:** F16 -- **Engines**: llama.cpp -- **Model ID:** THUDM/glm-edge-v-2b-gguf -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format ggufv2 --quantization ${quantization} - - -Model Spec 5 (ggufv2, 2 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** ggufv2 -- **Model Size (in billions):** 2 -- **Quantizations:** f16 -- **Engines**: llama.cpp -- **Model ID:** THUDM/glm-edge-v-2b-gguf -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format ggufv2 --quantization ${quantization} - - -Model Spec 6 (ggufv2, 5 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** ggufv2 -- **Model Size (in billions):** 5 -- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0 -- **Engines**: llama.cpp -- **Model ID:** THUDM/glm-edge-v-5b-gguf -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format ggufv2 --quantization ${quantization} - - -Model Spec 7 (ggufv2, 5 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** ggufv2 -- **Model Size (in billions):** 5 -- **Quantizations:** F16 -- **Engines**: llama.cpp -- **Model ID:** THUDM/glm-edge-v-5b-gguf -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format ggufv2 --quantization ${quantization} - - -Model Spec 8 (ggufv2, 5 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** ggufv2 -- **Model Size (in billions):** 5 -- **Quantizations:** f16 -- **Engines**: llama.cpp -- **Model ID:** THUDM/glm-edge-v-5b-gguf -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format ggufv2 --quantization ${quantization} - diff --git a/doc/source/models/builtin/llm/index.rst b/doc/source/models/builtin/llm/index.rst index af4c71acb7..0d487ec38e 100644 --- a/doc/source/models/builtin/llm/index.rst +++ b/doc/source/models/builtin/llm/index.rst @@ -76,16 +76,6 @@ The following is a list of built-in LLM in Xinference: - 4096 - The CogAgent-9B-20241220 model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, CogAgent-9B-20241220 achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability. - * - :ref:`cogvlm2 ` - - chat, vision - - 8192 - - CogVLM2 have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models. - - * - :ref:`cogvlm2-video-llama3-chat ` - - chat, vision - - 8192 - - CogVLM2-Video achieves state-of-the-art performance on multiple video question answering tasks. - * - :ref:`deepseek ` - generate - 4096 @@ -106,11 +96,21 @@ The following is a list of built-in LLM in Xinference: - 16384 - deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data. + * - :ref:`deepseek-prover-v2 ` + - chat, reasoning + - 163840 + - We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model + * - :ref:`deepseek-r1 ` - chat, reasoning - 163840 - DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. + * - :ref:`deepseek-r1-0528 ` + - chat, reasoning + - 163840 + - DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. + * - :ref:`deepseek-r1-distill-llama ` - chat, reasoning - 131072 @@ -121,11 +121,6 @@ The following is a list of built-in LLM in Xinference: - 131072 - deepseek-r1-distill-qwen is distilled from DeepSeek-R1 based on Qwen - * - :ref:`deepseek-v2 ` - - generate - - 128000 - - DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. - * - :ref:`deepseek-v2-chat ` - chat - 128000 @@ -146,16 +141,21 @@ The following is a list of built-in LLM in Xinference: - 163840 - DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. - * - :ref:`deepseek-vl-chat ` - - chat, vision - - 4096 - - DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. + * - :ref:`deepseek-v3-0324 ` + - chat + - 163840 + - DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. * - :ref:`deepseek-vl2 ` - chat, vision - 4096 - DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. + * - :ref:`dianjin-r1 ` + - chat, tools + - 32768 + - Tongyi DianJin is a financial intelligence solution platform built by Alibaba Cloud, dedicated to providing financial business developers with a convenient artificial intelligence application development environment. + * - :ref:`fin-r1 ` - chat - 131072 @@ -181,11 +181,6 @@ The following is a list of built-in LLM in Xinference: - 8192 - The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs. - * - :ref:`glm-edge-v ` - - chat, vision - - 8192 - - The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs. - * - :ref:`glm4-0414 ` - chat, tools - 32768 @@ -211,6 +206,16 @@ The following is a list of built-in LLM in Xinference: - 1024 - GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes. + * - :ref:`huatuogpt-o1-llama-3.1 ` + - chat, tools + - 131072 + - HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response. + + * - :ref:`huatuogpt-o1-qwen2.5 ` + - chat, tools + - 32768 + - HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response. + * - :ref:`internlm3-instruct ` - chat, tools - 32768 @@ -296,11 +301,6 @@ The following is a list of built-in LLM in Xinference: - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. - * - :ref:`minicpm-llama3-v-2_5 ` - - chat, vision - - 8192 - - MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. - * - :ref:`minicpm-v-2.6 ` - chat, vision - 32768 @@ -361,11 +361,6 @@ The following is a list of built-in LLM in Xinference: - 8192 - Kimi Muon is Scalable for LLM Training - * - :ref:`omnilmm ` - - chat, vision - - 2048 - - OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling. - * - :ref:`openhermes-2.5 ` - chat - 8192 @@ -411,11 +406,6 @@ The following is a list of built-in LLM in Xinference: - 32768 - Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting. - * - :ref:`qwen-vl-chat ` - - chat, vision - - 4096 - - Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities. - * - :ref:`qwen1.5-chat ` - chat, tools - 32768 @@ -487,7 +477,7 @@ The following is a list of built-in LLM in Xinference: - Qwen2.5-VL: Qwen2.5-VL is the latest version of the vision language models in the Qwen model familities. * - :ref:`qwen3 ` - - chat, reasoning, tools + - chat, reasoning, hybrid, tools - 40960 - Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support @@ -526,6 +516,11 @@ The following is a list of built-in LLM in Xinference: - 4096 - Skywork is a series of large models developed by the Kunlun Group · Skywork team. + * - :ref:`skywork-or1 ` + - chat + - 131072 + - We release the final version of Skywork-OR1 (Open Reasoner 1) series of models, including + * - :ref:`skywork-or1-preview ` - chat - 32768 @@ -551,6 +546,11 @@ The following is a list of built-in LLM in Xinference: - 2048 - WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math. + * - :ref:`xiyansql-qwencoder-2504 ` + - chat, tools + - 32768 + - The XiYanSQL-QwenCoder models, as multi-dialect SQL base models, demonstrating robust SQL generation capabilities. + * - :ref:`xverse ` - generate - 2048 @@ -620,10 +620,6 @@ The following is a list of built-in LLM in Xinference: cogagent - cogvlm2 - - cogvlm2-video-llama3-chat - deepseek deepseek-chat @@ -632,14 +628,16 @@ The following is a list of built-in LLM in Xinference: deepseek-coder-instruct + deepseek-prover-v2 + deepseek-r1 + deepseek-r1-0528 + deepseek-r1-distill-llama deepseek-r1-distill-qwen - deepseek-v2 - deepseek-v2-chat deepseek-v2-chat-0628 @@ -648,10 +646,12 @@ The following is a list of built-in LLM in Xinference: deepseek-v3 - deepseek-vl-chat + deepseek-v3-0324 deepseek-vl2 + dianjin-r1 + fin-r1 gemma-3-1b-it @@ -662,8 +662,6 @@ The following is a list of built-in LLM in Xinference: glm-edge-chat - glm-edge-v - glm4-0414 glm4-chat @@ -674,6 +672,10 @@ The following is a list of built-in LLM in Xinference: gpt-2 + huatuogpt-o1-llama-3.1 + + huatuogpt-o1-qwen2.5 + internlm3-instruct internvl3 @@ -708,8 +710,6 @@ The following is a list of built-in LLM in Xinference: minicpm-2b-sft-fp32 - minicpm-llama3-v-2_5 - minicpm-v-2.6 minicpm3-4b @@ -734,8 +734,6 @@ The following is a list of built-in LLM in Xinference: moonlight-16b-a3b-instruct - omnilmm - openhermes-2.5 opt @@ -754,8 +752,6 @@ The following is a list of built-in LLM in Xinference: qwen-chat - qwen-vl-chat - qwen1.5-chat qwen1.5-moe-chat @@ -800,6 +796,8 @@ The following is a list of built-in LLM in Xinference: skywork-math + skywork-or1 + skywork-or1-preview telechat @@ -810,6 +808,8 @@ The following is a list of built-in LLM in Xinference: wizardmath-v1.0 + xiyansql-qwencoder-2504 + xverse xverse-chat diff --git a/doc/source/models/builtin/llm/minicpm-llama3-v-2_5.rst b/doc/source/models/builtin/llm/minicpm-llama3-v-2_5.rst deleted file mode 100644 index ed2330ad74..0000000000 --- a/doc/source/models/builtin/llm/minicpm-llama3-v-2_5.rst +++ /dev/null @@ -1,47 +0,0 @@ -.. _models_llm_minicpm-llama3-v-2_5: - -======================================== -MiniCPM-Llama3-V-2_5 -======================================== - -- **Context Length:** 8192 -- **Model Name:** MiniCPM-Llama3-V-2_5 -- **Languages:** en, zh -- **Abilities:** chat, vision -- **Description:** MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. - -Specifications -^^^^^^^^^^^^^^ - - -Model Spec 1 (pytorch, 8 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 8 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** openbmb/MiniCPM-Llama3-V-2_5 -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name MiniCPM-Llama3-V-2_5 --size-in-billions 8 --model-format pytorch --quantization ${quantization} - - -Model Spec 2 (pytorch, 8 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 8 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** openbmb/MiniCPM-Llama3-V-2_5-{quantization} -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name MiniCPM-Llama3-V-2_5 --size-in-billions 8 --model-format pytorch --quantization ${quantization} - diff --git a/doc/source/models/builtin/llm/omnilmm.rst b/doc/source/models/builtin/llm/omnilmm.rst deleted file mode 100644 index c8a0a32226..0000000000 --- a/doc/source/models/builtin/llm/omnilmm.rst +++ /dev/null @@ -1,47 +0,0 @@ -.. _models_llm_omnilmm: - -======================================== -OmniLMM -======================================== - -- **Context Length:** 2048 -- **Model Name:** OmniLMM -- **Languages:** en, zh -- **Abilities:** chat, vision -- **Description:** OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling. - -Specifications -^^^^^^^^^^^^^^ - - -Model Spec 1 (pytorch, 3 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 3 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** openbmb/MiniCPM-V -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name OmniLMM --size-in-billions 3 --model-format pytorch --quantization ${quantization} - - -Model Spec 2 (pytorch, 12 Billion) -++++++++++++++++++++++++++++++++++++++++ - -- **Model Format:** pytorch -- **Model Size (in billions):** 12 -- **Quantizations:** none -- **Engines**: Transformers -- **Model ID:** openbmb/OmniLMM-12B -- **Model Hubs**: `Hugging Face `__, `ModelScope `__ - -Execute the following command to launch the model, remember to replace ``${quantization}`` with your -chosen quantization method from the options listed above:: - - xinference launch --model-engine ${engine} --model-name OmniLMM --size-in-billions 12 --model-format pytorch --quantization ${quantization} - diff --git a/doc/source/user_guide/backends.rst b/doc/source/user_guide/backends.rst index e16816db7e..c87782c76c 100644 --- a/doc/source/user_guide/backends.rst +++ b/doc/source/user_guide/backends.rst @@ -99,7 +99,7 @@ Currently, supported model includes: - ``codestral-v0.1`` - ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k`` - ``code-llama``, ``code-llama-python``, ``code-llama-instruct`` -- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-r1``, ``deepseek-r1-distill-llama`` +- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-prover-v2``, ``deepseek-r1-distill-llama`` - ``yi-coder``, ``yi-coder-chat`` - ``codeqwen1.5``, ``codeqwen1.5-chat`` - ``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-instruct``, ``qwen2.5-coder-instruct``, ``qwen2.5-instruct-1m`` @@ -113,11 +113,14 @@ Currently, supported model includes: - ``codegeex4`` - ``qwen1.5-chat``, ``qwen1.5-moe-chat`` - ``qwen2-instruct``, ``qwen2-moe-instruct`` +- ``XiYanSQL-QwenCoder-2504`` - ``QwQ-32B-Preview``, ``QwQ-32B`` - ``marco-o1`` - ``fin-r1`` - ``seallms-v3`` -- ``skywork-or1-preview`` +- ``skywork-or1-preview``, ``skywork-or1`` +- ``HuatuoGPT-o1-Qwen2.5``, ``HuatuoGPT-o1-LLaMA-3.1`` +- ``DianJin-R1`` - ``gemma-it``, ``gemma-2-it``, ``gemma-3-1b-it`` - ``orion-chat``, ``orion-chat-rag`` - ``c4ai-command-r-v01`` @@ -125,7 +128,6 @@ Currently, supported model includes: - ``internlm3-instruct`` - ``moonlight-16b-a3b-instruct`` - ``qwen3`` - .. vllm_end .. _sglang_backend: diff --git a/xinference/model/llm/llm_family.json b/xinference/model/llm/llm_family.json index 7a4a57a7e2..4b406a50fa 100644 --- a/xinference/model/llm/llm_family.json +++ b/xinference/model/llm/llm_family.json @@ -6462,6 +6462,44 @@ "<|end▁of▁sentence|>" ] }, + { + "version": 1, + "context_length": 163840, + "model_name": "deepseek-v3-0324", + "model_lang": [ + "en", + "zh" + ], + "model_ability": [ + "chat" + ], + "model_description": "DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. ", + "model_specs": [ + { + "model_format": "pytorch", + "model_size_in_billions": 671, + "quantizations": [ + "none" + ], + "model_id": "deepseek-ai/DeepSeek-V3-0324" + }, + { + "model_format": "awq", + "model_size_in_billions": 671, + "quantizations": [ + "Int4" + ], + "model_id": "cognitivecomputations/DeepSeek-V3-0324-AWQ" + } + ], + "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<|User|>' + message['content'] + '<|Assistant|>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}", + "stop_token_ids": [ + 1 + ], + "stop": [ + "<|end▁of▁sentence|>" + ] + }, { "version": 1, "context_length": 163840, @@ -6678,6 +6716,88 @@ "reasoning_start_tag": "", "reasoning_end_tag": "" }, + { + "version": 1, + "context_length": 163840, + "model_name": "deepseek-r1-0528", + "model_lang": [ + "en", + "zh" + ], + "model_ability": [ + "chat", + "reasoning" + ], + "model_description": "DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.", + "model_specs": [ + { + "model_format": "pytorch", + "model_size_in_billions": 671, + "quantizations": [ + "none" + ], + "model_id": "deepseek-ai/DeepSeek-R1-0528" + } + ], + "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\\n\\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{'<|Assistant|>' + message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '' in content %}{% set content = content.split('')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}", + "stop_token_ids": [ + 1 + ], + "stop": [ + "<|end▁of▁sentence|>" + ], + "reasoning_start_tag": "", + "reasoning_end_tag": "" + }, + { + "version": 1, + "context_length": 163840, + "model_name": "deepseek-prover-v2", + "model_lang": [ + "en", + "zh" + ], + "model_ability": [ + "chat", + "reasoning" + ], + "model_description": "We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model", + "model_specs": [ + { + "model_format": "pytorch", + "model_size_in_billions": 671, + "quantizations": [ + "none" + ], + "model_id": "deepseek-ai/DeepSeek-Prover-V2-671B" + }, + { + "model_format": "pytorch", + "model_size_in_billions": 7, + "quantizations": [ + "none" + ], + "model_id": "deepseek-ai/DeepSeek-Prover-V2-7B" + }, + { + "model_format": "mlx", + "model_size_in_billions": 7, + "quantizations": [ + "4bit" + ], + "model_id": "mlx-community/DeepSeek-Prover-V2-7B-4bit" + } + ], + "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<|User|>' + message['content'] + '<|Assistant|>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}", + "stop_token_ids": [ + 1 + ], + "stop": [ + "<|end▁of▁sentence|>" + ], + "reasoning_start_tag": "", + "reasoning_end_tag": "" + }, { "version": 1, "context_length": 32768, diff --git a/xinference/model/llm/llm_family_modelscope.json b/xinference/model/llm/llm_family_modelscope.json index 0d2482f329..51c104f26c 100644 --- a/xinference/model/llm/llm_family_modelscope.json +++ b/xinference/model/llm/llm_family_modelscope.json @@ -4600,6 +4600,46 @@ "<|end▁of▁sentence|>" ] }, + { + "version": 1, + "context_length": 163840, + "model_name": "deepseek-v3-0324", + "model_lang": [ + "en", + "zh" + ], + "model_ability": [ + "chat" + ], + "model_description": "DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. ", + "model_specs": [ + { + "model_format": "pytorch", + "model_size_in_billions": 671, + "quantizations": [ + "none" + ], + "model_id": "deepseek-ai/DeepSeek-V3-0324", + "model_hub": "modelscope" + }, + { + "model_format": "awq", + "model_size_in_billions": 671, + "quantizations": [ + "Int4" + ], + "model_id": "cognitivecomputations/DeepSeek-V3-0324-AWQ", + "model_hub": "modelscope" + } + ], + "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<|User|>' + message['content'] + '<|Assistant|>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}", + "stop_token_ids": [ + 1 + ], + "stop": [ + "<|end▁of▁sentence|>" + ] + }, { "version": 1, "context_length": 163840, @@ -4821,6 +4861,92 @@ "reasoning_start_tag": "", "reasoning_end_tag": "" }, + { + "version": 1, + "context_length": 163840, + "model_name": "deepseek-r1-0528", + "model_lang": [ + "en", + "zh" + ], + "model_ability": [ + "chat", + "reasoning" + ], + "model_description": "DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.", + "model_specs": [ + { + "model_format": "pytorch", + "model_size_in_billions": 671, + "quantizations": [ + "none" + ], + "model_id": "deepseek-ai/DeepSeek-R1-0528", + "model_hub": "modelscope" + } + ], + "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\\n\\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{'<|Assistant|>' + message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '' in content %}{% set content = content.split('')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}", + "stop_token_ids": [ + 1 + ], + "stop": [ + "<|end▁of▁sentence|>" + ], + "reasoning_start_tag": "", + "reasoning_end_tag": "" + }, + { + "version": 1, + "context_length": 163840, + "model_name": "deepseek-prover-v2", + "model_lang": [ + "en", + "zh" + ], + "model_ability": [ + "chat", + "reasoning" + ], + "model_description": "We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model", + "model_specs": [ + { + "model_format": "pytorch", + "model_size_in_billions": 671, + "quantizations": [ + "none" + ], + "model_id": "deepseek-ai/DeepSeek-Prover-V2-671B", + "model_hub": "modelscope" + }, + { + "model_format": "pytorch", + "model_size_in_billions": 7, + "quantizations": [ + "none" + ], + "model_id": "deepseek-ai/DeepSeek-Prover-V2-7B", + "model_hub": "modelscope" + }, + { + "model_format": "mlx", + "model_size_in_billions": 7, + "quantizations": [ + "4bit" + ], + "model_id": "mlx-community/DeepSeek-Prover-V2-7B-4bit", + "model_hub": "modelscope" + } + ], + "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<|User|>' + message['content'] + '<|Assistant|>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}", + "stop_token_ids": [ + 1 + ], + "stop": [ + "<|end▁of▁sentence|>" + ], + "reasoning_start_tag": "", + "reasoning_end_tag": "" + }, { "version": 1, "context_length": 32768, diff --git a/xinference/model/llm/sglang/core.py b/xinference/model/llm/sglang/core.py index 8955e90185..43711bd2db 100644 --- a/xinference/model/llm/sglang/core.py +++ b/xinference/model/llm/sglang/core.py @@ -107,7 +107,10 @@ class SGLANGGenerateConfig(TypedDict, total=False): "deepseek-r1-distill-qwen", "deepseek-r1-distill-llama", "deepseek-v3", + "deepseek-v3-0324", "deepseek-r1", + "deepseek-r1-0528", + "deepseek-prover-v2", "DianJin-R1", "qwen3", "HuatuoGPT-o1-Qwen2.5", diff --git a/xinference/model/llm/vllm/core.py b/xinference/model/llm/vllm/core.py index 6762fea8d8..ebb6071d1f 100644 --- a/xinference/model/llm/vllm/core.py +++ b/xinference/model/llm/vllm/core.py @@ -199,7 +199,10 @@ class VLLMGenerateConfig(TypedDict, total=False): VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-v2-chat-0628") VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-v2.5") VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-v3") + VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-v3-0324") VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-r1") + VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-r1-0528") + VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-prover-v2") if VLLM_INSTALLED and vllm.__version__ >= "0.5.3": VLLM_SUPPORTED_CHAT_MODELS.append("gemma-2-it") diff --git a/xinference/web/ui/src/scenes/launch_model/data/data.js b/xinference/web/ui/src/scenes/launch_model/data/data.js index 324f7804e1..e630e329f4 100644 --- a/xinference/web/ui/src/scenes/launch_model/data/data.js +++ b/xinference/web/ui/src/scenes/launch_model/data/data.js @@ -79,8 +79,8 @@ export const featureModels = [ type: 'llm', feature_models: [ 'qwen3', - 'deepseek-v3', - 'deepseek-r1', + 'deepseek-v3-0324', + 'deepseek-r1-0528', 'deepseek-r1-distill-qwen', 'deepseek-r1-distill-llama', 'qwen2.5-instruct',