Skip to content

FEAT: support Deepseek-R1-0528 #3539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,14 @@ potential of cutting-edge AI models.
- Support SGLang backend: [#1161](https://github.com/xorbitsai/inference/pull/1161)
- Support LoRA for LLM and image models: [#1080](https://github.com/xorbitsai/inference/pull/1080)
### New Models
- Built-in support for [Deepseek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528): [#3539](https://github.com/xorbitsai/inference/pull/3539)
- Built-in support for [Qwen3](https://qwenlm.github.io/blog/qwen3/): [#3347](https://github.com/xorbitsai/inference/pull/3347)
- Built-in support for [Qwen2.5-Omni](https://github.com/QwenLM/Qwen2.5-Omni): [#3279](https://github.com/xorbitsai/inference/pull/3279)
- Built-in support for [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): [#3274](https://github.com/xorbitsai/inference/pull/3274)
- Built-in support for [GLM-4-0414](https://github.com/THUDM/GLM-4): [#3251](https://github.com/xorbitsai/inference/pull/3251)
- Built-in support for [SeaLLMs-v3](https://github.com/DAMO-NLP-SG/DAMO-SeaLLMs): [#3248](https://github.com/xorbitsai/inference/pull/3248)
- Built-in support for [paraformer-zh](https://huggingface.co/funasr/paraformer-zh): [#3236](https://github.com/xorbitsai/inference/pull/3236)
- Built-in support for [InternVL3](https://internvl.github.io/blog/2025-04-11-InternVL-3.0/): [#3235](https://github.com/xorbitsai/inference/pull/3235)
- Built-in support for [MegaTTS3](https://github.com/bytedance/MegaTTS3): [#3224](https://github.com/xorbitsai/inference/pull/3224)
### Integrations
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
- [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.
Expand Down
2 changes: 1 addition & 1 deletion README_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,14 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布
- 支持 SGLang 后端: [#1161](https://github.com/xorbitsai/inference/pull/1161)
- 支持LLM和图像模型的LoRA: [#1080](https://github.com/xorbitsai/inference/pull/1080)
### 新模型
- 内置 [Deepseek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528): [#3539](https://github.com/xorbitsai/inference/pull/3539)
- 内置 [Qwen3](https://qwenlm.github.io/blog/qwen3/): [#3347](https://github.com/xorbitsai/inference/pull/3347)
- 内置 [Qwen2.5-Omni](https://github.com/QwenLM/Qwen2.5-Omni): [#3279](https://github.com/xorbitsai/inference/pull/3279)
- 内置 [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): [#3274](https://github.com/xorbitsai/inference/pull/3274)
- 内置 [GLM-4-0414](https://github.com/THUDM/GLM-4): [#3251](https://github.com/xorbitsai/inference/pull/3251)
- 内置 [SeaLLMs-v3](https://github.com/DAMO-NLP-SG/DAMO-SeaLLMs): [#3248](https://github.com/xorbitsai/inference/pull/3248)
- 内置 [paraformer-zh](https://huggingface.co/funasr/paraformer-zh): [#3236](https://github.com/xorbitsai/inference/pull/3236)
- 内置 [InternVL3](https://internvl.github.io/blog/2025-04-11-InternVL-3.0/): [#3235](https://github.com/xorbitsai/inference/pull/3235)
- 内置 [MegaTTS3](https://github.com/bytedance/MegaTTS3): [#3224](https://github.com/xorbitsai/inference/pull/3224)
### 集成
- [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/):一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力,帮助您轻松实现复杂的问答场景。
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
Expand Down
7 changes: 5 additions & 2 deletions doc/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Currently, supported models include:
- ``codestral-v0.1``
- ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``
- ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-r1``, ``deepseek-r1-distill-llama``
- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-prover-v2``, ``deepseek-r1-distill-llama``
- ``yi-coder``, ``yi-coder-chat``
- ``codeqwen1.5``, ``codeqwen1.5-chat``
- ``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-instruct``, ``qwen2.5-coder-instruct``, ``qwen2.5-instruct-1m``
Expand All @@ -74,11 +74,14 @@ Currently, supported models include:
- ``codegeex4``
- ``qwen1.5-chat``, ``qwen1.5-moe-chat``
- ``qwen2-instruct``, ``qwen2-moe-instruct``
- ``XiYanSQL-QwenCoder-2504``
- ``QwQ-32B-Preview``, ``QwQ-32B``
- ``marco-o1``
- ``fin-r1``
- ``seallms-v3``
- ``skywork-or1-preview``
- ``skywork-or1-preview``, ``skywork-or1``
- ``HuatuoGPT-o1-Qwen2.5``, ``HuatuoGPT-o1-LLaMA-3.1``
- ``DianJin-R1``
- ``gemma-it``, ``gemma-2-it``, ``gemma-3-1b-it``
- ``orion-chat``, ``orion-chat-rag``
- ``c4ai-command-r-v01``
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ msgstr ""

#: ../../source/models/model_abilities/audio.rst:5
msgid "Audio"
msgstr ""
msgstr "音频"

#: ../../source/models/model_abilities/audio.rst:7
msgid "Learn how to turn audio into text or text into audio with Xinference."
Expand Down Expand Up @@ -358,7 +358,7 @@ msgstr "基本使用,加载模型 ``CosyVoice-300M-SFT``。"
msgid ""
"Please note that the latest CosyVoice 2.0 requires `use_flow_cache=True` "
"for stream generation."
msgstr ""
msgstr "请注意,最新版本的 CosyVoice 2.0 在进行流式生成时需要设置 `use_flow_cache=True`。"

#: ../../source/models/model_abilities/audio.rst:422
msgid ""
Expand Down
6 changes: 6 additions & 0 deletions doc/source/models/builtin/audio/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@ The following is a list of built-in audio models in Xinference:

paraformer-zh

paraformer-zh-hotword

paraformer-zh-long

paraformer-zh-spk

sensevoicesmall

whisper-base
Expand Down
19 changes: 19 additions & 0 deletions doc/source/models/builtin/audio/paraformer-zh-hotword.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.. _models_builtin_paraformer-zh-hotword:

=====================
paraformer-zh-hotword
=====================

- **Model Name:** paraformer-zh-hotword
- **Model Family:** funasr
- **Abilities:** ['audio2text']
- **Multilingual:** False

Specifications
^^^^^^^^^^^^^^

- **Model ID:** JunHowie/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404

Execute the following command to launch the model::

xinference launch --model-name paraformer-zh-hotword --model-type audio
19 changes: 19 additions & 0 deletions doc/source/models/builtin/audio/paraformer-zh-long.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.. _models_builtin_paraformer-zh-long:

==================
paraformer-zh-long
==================

- **Model Name:** paraformer-zh-long
- **Model Family:** funasr
- **Abilities:** ['audio2text']
- **Multilingual:** False

Specifications
^^^^^^^^^^^^^^

- **Model ID:** JunHowie/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch

Execute the following command to launch the model::

xinference launch --model-name paraformer-zh-long --model-type audio
19 changes: 19 additions & 0 deletions doc/source/models/builtin/audio/paraformer-zh-spk.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.. _models_builtin_paraformer-zh-spk:

=================
paraformer-zh-spk
=================

- **Model Name:** paraformer-zh-spk
- **Model Family:** funasr
- **Abilities:** ['audio2text']
- **Multilingual:** False

Specifications
^^^^^^^^^^^^^^

- **Model ID:** JunHowie/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn

Execute the following command to launch the model::

xinference launch --model-name paraformer-zh-spk --model-type audio
31 changes: 0 additions & 31 deletions doc/source/models/builtin/llm/cogvlm2-video-llama3-chat.rst

This file was deleted.

47 changes: 0 additions & 47 deletions doc/source/models/builtin/llm/cogvlm2.rst

This file was deleted.

63 changes: 63 additions & 0 deletions doc/source/models/builtin/llm/deepseek-prover-v2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
.. _models_llm_deepseek-prover-v2:

========================================
deepseek-prover-v2
========================================

- **Context Length:** 163840
- **Model Name:** deepseek-prover-v2
- **Languages:** en, zh
- **Abilities:** chat, reasoning
- **Description:** We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 671 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 671
- **Quantizations:** none
- **Engines**: vLLM, Transformers
- **Model ID:** deepseek-ai/DeepSeek-Prover-V2-671B
- **Model Hubs**: `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-Prover-V2-671B>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 671 --model-format pytorch --quantization ${quantization}


Model Spec 2 (pytorch, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 7
- **Quantizations:** none
- **Engines**: vLLM, Transformers
- **Model ID:** deepseek-ai/DeepSeek-Prover-V2-7B
- **Model Hubs**: `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-7B>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-Prover-V2-7B>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 7 --model-format pytorch --quantization ${quantization}


Model Spec 3 (mlx, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** mlx
- **Model Size (in billions):** 7
- **Quantizations:** 4bit
- **Engines**:
- **Model ID:** mlx-community/DeepSeek-Prover-V2-7B-4bit
- **Model Hubs**: `Hugging Face <https://huggingface.co/mlx-community/DeepSeek-Prover-V2-7B-4bit>`__, `ModelScope <https://modelscope.cn/models/mlx-community/DeepSeek-Prover-V2-7B-4bit>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 7 --model-format mlx --quantization ${quantization}

31 changes: 31 additions & 0 deletions doc/source/models/builtin/llm/deepseek-r1-0528.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
.. _models_llm_deepseek-r1-0528:

========================================
deepseek-r1-0528
========================================

- **Context Length:** 163840
- **Model Name:** deepseek-r1-0528
- **Languages:** en, zh
- **Abilities:** chat, reasoning
- **Description:** DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 671 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 671
- **Quantizations:** none
- **Engines**: vLLM, Transformers
- **Model ID:** deepseek-ai/DeepSeek-R1-0528
- **Model Hubs**: `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-R1-0528>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-0528>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name deepseek-r1-0528 --size-in-billions 671 --model-format pytorch --quantization ${quantization}

47 changes: 0 additions & 47 deletions doc/source/models/builtin/llm/deepseek-v2.rst

This file was deleted.

Loading
Loading