diff --git a/README.md b/README.md
index 231e4e500b..cbf3e6044a 100644
--- a/README.md
+++ b/README.md
@@ -47,6 +47,7 @@ potential of cutting-edge AI models.
 - Support SGLang backend: [#1161](https://github.com/xorbitsai/inference/pull/1161)
 - Support LoRA for LLM and image models: [#1080](https://github.com/xorbitsai/inference/pull/1080)
 ### New Models
+- Built-in support for [Deepseek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528): [#3539](https://github.com/xorbitsai/inference/pull/3539)
 - Built-in support for [Qwen3](https://qwenlm.github.io/blog/qwen3/): [#3347](https://github.com/xorbitsai/inference/pull/3347)
 - Built-in support for [Qwen2.5-Omni](https://github.com/QwenLM/Qwen2.5-Omni): [#3279](https://github.com/xorbitsai/inference/pull/3279)
 - Built-in support for [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): [#3274](https://github.com/xorbitsai/inference/pull/3274)
@@ -54,7 +55,6 @@ potential of cutting-edge AI models.
 - Built-in support for [SeaLLMs-v3](https://github.com/DAMO-NLP-SG/DAMO-SeaLLMs): [#3248](https://github.com/xorbitsai/inference/pull/3248)
 - Built-in support for [paraformer-zh](https://huggingface.co/funasr/paraformer-zh): [#3236](https://github.com/xorbitsai/inference/pull/3236)
 - Built-in support for [InternVL3](https://internvl.github.io/blog/2025-04-11-InternVL-3.0/): [#3235](https://github.com/xorbitsai/inference/pull/3235)
-- Built-in support for [MegaTTS3](https://github.com/bytedance/MegaTTS3): [#3224](https://github.com/xorbitsai/inference/pull/3224)
 ### Integrations
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
 - [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.
diff --git a/README_zh_CN.md b/README_zh_CN.md
index cb969a7ac9..02c6f98475 100644
--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@@ -43,6 +43,7 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 - 支持 SGLang 后端: [#1161](https://github.com/xorbitsai/inference/pull/1161)
 - 支持LLM和图像模型的LoRA: [#1080](https://github.com/xorbitsai/inference/pull/1080)
 ### 新模型
+- 内置 [Deepseek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528): [#3539](https://github.com/xorbitsai/inference/pull/3539)
 - 内置 [Qwen3](https://qwenlm.github.io/blog/qwen3/): [#3347](https://github.com/xorbitsai/inference/pull/3347)
 - 内置 [Qwen2.5-Omni](https://github.com/QwenLM/Qwen2.5-Omni): [#3279](https://github.com/xorbitsai/inference/pull/3279)
 - 内置 [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): [#3274](https://github.com/xorbitsai/inference/pull/3274)
@@ -50,7 +51,6 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 - 内置 [SeaLLMs-v3](https://github.com/DAMO-NLP-SG/DAMO-SeaLLMs): [#3248](https://github.com/xorbitsai/inference/pull/3248)
 - 内置 [paraformer-zh](https://huggingface.co/funasr/paraformer-zh): [#3236](https://github.com/xorbitsai/inference/pull/3236)
 - 内置 [InternVL3](https://internvl.github.io/blog/2025-04-11-InternVL-3.0/): [#3235](https://github.com/xorbitsai/inference/pull/3235)
-- 内置 [MegaTTS3](https://github.com/bytedance/MegaTTS3): [#3224](https://github.com/xorbitsai/inference/pull/3224)
 ### 集成
 - [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/)：一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力，帮助您轻松实现复杂的问答场景。
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
diff --git a/doc/source/getting_started/installation.rst b/doc/source/getting_started/installation.rst
index da83a31bdf..0c3455fb47 100644
--- a/doc/source/getting_started/installation.rst
+++ b/doc/source/getting_started/installation.rst
@@ -60,7 +60,7 @@ Currently, supported models include:
 - ``codestral-v0.1``
 - ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``
 - ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
-- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-r1``, ``deepseek-r1-distill-llama``
+- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-prover-v2``, ``deepseek-r1-distill-llama``
 - ``yi-coder``, ``yi-coder-chat``
 - ``codeqwen1.5``, ``codeqwen1.5-chat``
 - ``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-instruct``, ``qwen2.5-coder-instruct``, ``qwen2.5-instruct-1m``
@@ -74,11 +74,14 @@ Currently, supported models include:
 - ``codegeex4``
 - ``qwen1.5-chat``, ``qwen1.5-moe-chat``
 - ``qwen2-instruct``, ``qwen2-moe-instruct``
+- ``XiYanSQL-QwenCoder-2504``
 - ``QwQ-32B-Preview``, ``QwQ-32B``
 - ``marco-o1``
 - ``fin-r1``
 - ``seallms-v3``
-- ``skywork-or1-preview``
+- ``skywork-or1-preview``, ``skywork-or1``
+- ``HuatuoGPT-o1-Qwen2.5``, ``HuatuoGPT-o1-LLaMA-3.1``
+- ``DianJin-R1``
 - ``gemma-it``, ``gemma-2-it``, ``gemma-3-1b-it``
 - ``orion-chat``, ``orion-chat-rag``
 - ``c4ai-command-r-v01``
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po
index 6cb76f6311..abf06d6d2e 100644
--- a/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po
+++ b/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/audio.po
@@ -21,7 +21,7 @@ msgstr ""
 
 #: ../../source/models/model_abilities/audio.rst:5
 msgid "Audio"
-msgstr ""
+msgstr "音频"
 
 #: ../../source/models/model_abilities/audio.rst:7
 msgid "Learn how to turn audio into text or text into audio with Xinference."
@@ -358,7 +358,7 @@ msgstr "基本使用，加载模型 ``CosyVoice-300M-SFT``。"
 msgid ""
 "Please note that the latest CosyVoice 2.0 requires `use_flow_cache=True` "
 "for stream generation."
-msgstr ""
+msgstr "请注意，最新版本的 CosyVoice 2.0 在进行流式生成时需要设置 `use_flow_cache=True`。"
 
 #: ../../source/models/model_abilities/audio.rst:422
 msgid ""
diff --git a/doc/source/models/builtin/audio/index.rst b/doc/source/models/builtin/audio/index.rst
index cc1d7ebad4..ceefae0dc2 100644
--- a/doc/source/models/builtin/audio/index.rst
+++ b/doc/source/models/builtin/audio/index.rst
@@ -55,6 +55,12 @@ The following is a list of built-in audio models in Xinference:
   
    paraformer-zh
   
+   paraformer-zh-hotword
+  
+   paraformer-zh-long
+  
+   paraformer-zh-spk
+  
    sensevoicesmall
   
    whisper-base
diff --git a/doc/source/models/builtin/audio/paraformer-zh-hotword.rst b/doc/source/models/builtin/audio/paraformer-zh-hotword.rst
new file mode 100644
index 0000000000..a15668bbee
--- /dev/null
+++ b/doc/source/models/builtin/audio/paraformer-zh-hotword.rst
@@ -0,0 +1,19 @@
+.. _models_builtin_paraformer-zh-hotword:
+
+=====================
+paraformer-zh-hotword
+=====================
+
+- **Model Name:** paraformer-zh-hotword
+- **Model Family:** funasr
+- **Abilities:** ['audio2text']
+- **Multilingual:** False
+
+Specifications
+^^^^^^^^^^^^^^
+
+- **Model ID:** JunHowie/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404
+
+Execute the following command to launch the model::
+
+   xinference launch --model-name paraformer-zh-hotword --model-type audio
\ No newline at end of file
diff --git a/doc/source/models/builtin/audio/paraformer-zh-long.rst b/doc/source/models/builtin/audio/paraformer-zh-long.rst
new file mode 100644
index 0000000000..ec1d89969b
--- /dev/null
+++ b/doc/source/models/builtin/audio/paraformer-zh-long.rst
@@ -0,0 +1,19 @@
+.. _models_builtin_paraformer-zh-long:
+
+==================
+paraformer-zh-long
+==================
+
+- **Model Name:** paraformer-zh-long
+- **Model Family:** funasr
+- **Abilities:** ['audio2text']
+- **Multilingual:** False
+
+Specifications
+^^^^^^^^^^^^^^
+
+- **Model ID:** JunHowie/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch
+
+Execute the following command to launch the model::
+
+   xinference launch --model-name paraformer-zh-long --model-type audio
\ No newline at end of file
diff --git a/doc/source/models/builtin/audio/paraformer-zh-spk.rst b/doc/source/models/builtin/audio/paraformer-zh-spk.rst
new file mode 100644
index 0000000000..bec3917e3b
--- /dev/null
+++ b/doc/source/models/builtin/audio/paraformer-zh-spk.rst
@@ -0,0 +1,19 @@
+.. _models_builtin_paraformer-zh-spk:
+
+=================
+paraformer-zh-spk
+=================
+
+- **Model Name:** paraformer-zh-spk
+- **Model Family:** funasr
+- **Abilities:** ['audio2text']
+- **Multilingual:** False
+
+Specifications
+^^^^^^^^^^^^^^
+
+- **Model ID:** JunHowie/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn
+
+Execute the following command to launch the model::
+
+   xinference launch --model-name paraformer-zh-spk --model-type audio
\ No newline at end of file
diff --git a/doc/source/models/builtin/llm/cogvlm2-video-llama3-chat.rst b/doc/source/models/builtin/llm/cogvlm2-video-llama3-chat.rst
deleted file mode 100644
index dc80b20085..0000000000
--- a/doc/source/models/builtin/llm/cogvlm2-video-llama3-chat.rst
+++ /dev/null
@@ -1,31 +0,0 @@
-.. _models_llm_cogvlm2-video-llama3-chat:
-
-========================================
-cogvlm2-video-llama3-chat
-========================================
-
-- **Context Length:** 8192
-- **Model Name:** cogvlm2-video-llama3-chat
-- **Languages:** en, zh
-- **Abilities:** chat, vision
-- **Description:** CogVLM2-Video achieves state-of-the-art performance on multiple video question answering tasks.
-
-Specifications
-^^^^^^^^^^^^^^
-
-
-Model Spec 1 (pytorch, 12 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 12
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** THUDM/cogvlm2-video-llama3-chat
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/cogvlm2-video-llama3-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-chat>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name cogvlm2-video-llama3-chat --size-in-billions 12 --model-format pytorch --quantization ${quantization}
-
diff --git a/doc/source/models/builtin/llm/cogvlm2.rst b/doc/source/models/builtin/llm/cogvlm2.rst
deleted file mode 100644
index ff10d229b6..0000000000
--- a/doc/source/models/builtin/llm/cogvlm2.rst
+++ /dev/null
@@ -1,47 +0,0 @@
-.. _models_llm_cogvlm2:
-
-========================================
-cogvlm2
-========================================
-
-- **Context Length:** 8192
-- **Model Name:** cogvlm2
-- **Languages:** en, zh
-- **Abilities:** chat, vision
-- **Description:** CogVLM2 have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models.
-
-Specifications
-^^^^^^^^^^^^^^
-
-
-Model Spec 1 (pytorch, 20 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 20
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** THUDM/cogvlm2-llama3-chinese-chat-19B
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/cogvlm2-llama3-chinese-chat-19B>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chinese-chat-19B-{quantization}>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name cogvlm2 --size-in-billions 20 --model-format pytorch --quantization ${quantization}
-
-
-Model Spec 2 (pytorch, 20 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 20
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** THUDM/cogvlm2-llama3-chinese-chat-19B-{quantization}
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/cogvlm2-llama3-chinese-chat-19B-{quantization}>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chinese-chat-19B-{quantization}>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name cogvlm2 --size-in-billions 20 --model-format pytorch --quantization ${quantization}
-
diff --git a/doc/source/models/builtin/llm/deepseek-prover-v2.rst b/doc/source/models/builtin/llm/deepseek-prover-v2.rst
new file mode 100644
index 0000000000..fb4a301ca4
--- /dev/null
+++ b/doc/source/models/builtin/llm/deepseek-prover-v2.rst
@@ -0,0 +1,63 @@
+.. _models_llm_deepseek-prover-v2:
+
+========================================
+deepseek-prover-v2
+========================================
+
+- **Context Length:** 163840
+- **Model Name:** deepseek-prover-v2
+- **Languages:** en, zh
+- **Abilities:** chat, reasoning
+- **Description:** We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 671 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 671
+- **Quantizations:** none
+- **Engines**: vLLM, Transformers
+- **Model ID:** deepseek-ai/DeepSeek-Prover-V2-671B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-Prover-V2-671B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 671 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (pytorch, 7 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 7
+- **Quantizations:** none
+- **Engines**: vLLM, Transformers
+- **Model ID:** deepseek-ai/DeepSeek-Prover-V2-7B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-7B>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-Prover-V2-7B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 3 (mlx, 7 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** mlx
+- **Model Size (in billions):** 7
+- **Quantizations:** 4bit
+- **Engines**: 
+- **Model ID:** mlx-community/DeepSeek-Prover-V2-7B-4bit
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/DeepSeek-Prover-V2-7B-4bit>`__, `ModelScope <https://modelscope.cn/models/mlx-community/DeepSeek-Prover-V2-7B-4bit>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-prover-v2 --size-in-billions 7 --model-format mlx --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/deepseek-r1-0528.rst b/doc/source/models/builtin/llm/deepseek-r1-0528.rst
new file mode 100644
index 0000000000..af1ee45c15
--- /dev/null
+++ b/doc/source/models/builtin/llm/deepseek-r1-0528.rst
@@ -0,0 +1,31 @@
+.. _models_llm_deepseek-r1-0528:
+
+========================================
+deepseek-r1-0528
+========================================
+
+- **Context Length:** 163840
+- **Model Name:** deepseek-r1-0528
+- **Languages:** en, zh
+- **Abilities:** chat, reasoning
+- **Description:** DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 671 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 671
+- **Quantizations:** none
+- **Engines**: vLLM, Transformers
+- **Model ID:** deepseek-ai/DeepSeek-R1-0528
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-R1-0528>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-0528>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-r1-0528 --size-in-billions 671 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/deepseek-v2.rst b/doc/source/models/builtin/llm/deepseek-v2.rst
deleted file mode 100644
index fb63b10b2e..0000000000
--- a/doc/source/models/builtin/llm/deepseek-v2.rst
+++ /dev/null
@@ -1,47 +0,0 @@
-.. _models_llm_deepseek-v2:
-
-========================================
-deepseek-v2
-========================================
-
-- **Context Length:** 128000
-- **Model Name:** deepseek-v2
-- **Languages:** en, zh
-- **Abilities:** generate
-- **Description:** DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. 
-
-Specifications
-^^^^^^^^^^^^^^
-
-
-Model Spec 1 (pytorch, 16 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 16
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** deepseek-ai/DeepSeek-V2-Lite
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name deepseek-v2 --size-in-billions 16 --model-format pytorch --quantization ${quantization}
-
-
-Model Spec 2 (pytorch, 236 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 236
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** deepseek-ai/DeepSeek-V2
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-V2>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-V2>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name deepseek-v2 --size-in-billions 236 --model-format pytorch --quantization ${quantization}
-
diff --git a/doc/source/models/builtin/llm/deepseek-v3-0324.rst b/doc/source/models/builtin/llm/deepseek-v3-0324.rst
new file mode 100644
index 0000000000..e6aecb76b1
--- /dev/null
+++ b/doc/source/models/builtin/llm/deepseek-v3-0324.rst
@@ -0,0 +1,47 @@
+.. _models_llm_deepseek-v3-0324:
+
+========================================
+deepseek-v3-0324
+========================================
+
+- **Context Length:** 163840
+- **Model Name:** deepseek-v3-0324
+- **Languages:** en, zh
+- **Abilities:** chat
+- **Description:** DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. 
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 671 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 671
+- **Quantizations:** none
+- **Engines**: vLLM, Transformers
+- **Model ID:** deepseek-ai/DeepSeek-V3-0324
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-V3-0324>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-V3-0324>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (awq, 671 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** awq
+- **Model Size (in billions):** 671
+- **Quantizations:** Int4
+- **Engines**: vLLM, Transformers
+- **Model ID:** cognitivecomputations/DeepSeek-V3-0324-AWQ
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ>`__, `ModelScope <https://modelscope.cn/models/cognitivecomputations/DeepSeek-V3-0324-AWQ>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format awq --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/deepseek-vl-chat.rst b/doc/source/models/builtin/llm/deepseek-vl-chat.rst
deleted file mode 100644
index ae00c384eb..0000000000
--- a/doc/source/models/builtin/llm/deepseek-vl-chat.rst
+++ /dev/null
@@ -1,47 +0,0 @@
-.. _models_llm_deepseek-vl-chat:
-
-========================================
-deepseek-vl-chat
-========================================
-
-- **Context Length:** 4096
-- **Model Name:** deepseek-vl-chat
-- **Languages:** en, zh
-- **Abilities:** chat, vision
-- **Description:** DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.
-
-Specifications
-^^^^^^^^^^^^^^
-
-
-Model Spec 1 (pytorch, 1_3 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 1_3
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** deepseek-ai/deepseek-vl-1.3b-chat
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-vl-1.3b-chat>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-vl-1.3b-chat>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name deepseek-vl-chat --size-in-billions 1_3 --model-format pytorch --quantization ${quantization}
-
-
-Model Spec 2 (pytorch, 7 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 7
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** deepseek-ai/deepseek-vl-7b-chat
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name deepseek-vl-chat --size-in-billions 7 --model-format pytorch --quantization ${quantization}
-
diff --git a/doc/source/models/builtin/llm/glm-edge-v.rst b/doc/source/models/builtin/llm/glm-edge-v.rst
deleted file mode 100644
index 6aef562071..0000000000
--- a/doc/source/models/builtin/llm/glm-edge-v.rst
+++ /dev/null
@@ -1,143 +0,0 @@
-.. _models_llm_glm-edge-v:
-
-========================================
-glm-edge-v
-========================================
-
-- **Context Length:** 8192
-- **Model Name:** glm-edge-v
-- **Languages:** en, zh
-- **Abilities:** chat, vision
-- **Description:** The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs.
-
-Specifications
-^^^^^^^^^^^^^^
-
-
-Model Spec 1 (pytorch, 2 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 2
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** THUDM/glm-edge-v-2b
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-2b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format pytorch --quantization ${quantization}
-
-
-Model Spec 2 (pytorch, 5 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 5
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** THUDM/glm-edge-v-5b
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-5b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-5b>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format pytorch --quantization ${quantization}
-
-
-Model Spec 3 (ggufv2, 2 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** ggufv2
-- **Model Size (in billions):** 2
-- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
-- **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-v-2b-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-2b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format ggufv2 --quantization ${quantization}
-
-
-Model Spec 4 (ggufv2, 2 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** ggufv2
-- **Model Size (in billions):** 2
-- **Quantizations:** F16
-- **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-v-2b-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-2b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format ggufv2 --quantization ${quantization}
-
-
-Model Spec 5 (ggufv2, 2 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** ggufv2
-- **Model Size (in billions):** 2
-- **Quantizations:** f16
-- **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-v-2b-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-2b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format ggufv2 --quantization ${quantization}
-
-
-Model Spec 6 (ggufv2, 5 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** ggufv2
-- **Model Size (in billions):** 5
-- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
-- **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-v-5b-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-5b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-5b-gguf>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format ggufv2 --quantization ${quantization}
-
-
-Model Spec 7 (ggufv2, 5 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** ggufv2
-- **Model Size (in billions):** 5
-- **Quantizations:** F16
-- **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-v-5b-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-5b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-5b-gguf>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format ggufv2 --quantization ${quantization}
-
-
-Model Spec 8 (ggufv2, 5 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** ggufv2
-- **Model Size (in billions):** 5
-- **Quantizations:** f16
-- **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-v-5b-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-5b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-5b-gguf>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format ggufv2 --quantization ${quantization}
-
diff --git a/doc/source/models/builtin/llm/index.rst b/doc/source/models/builtin/llm/index.rst
index af4c71acb7..0d487ec38e 100644
--- a/doc/source/models/builtin/llm/index.rst
+++ b/doc/source/models/builtin/llm/index.rst
@@ -76,16 +76,6 @@ The following is a list of built-in LLM in Xinference:
      - 4096
      - The CogAgent-9B-20241220 model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, CogAgent-9B-20241220 achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability. 
 
-   * - :ref:`cogvlm2 <models_llm_cogvlm2>`
-     - chat, vision
-     - 8192
-     - CogVLM2 have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models.
-
-   * - :ref:`cogvlm2-video-llama3-chat <models_llm_cogvlm2-video-llama3-chat>`
-     - chat, vision
-     - 8192
-     - CogVLM2-Video achieves state-of-the-art performance on multiple video question answering tasks.
-
    * - :ref:`deepseek <models_llm_deepseek>`
      - generate
      - 4096
@@ -106,11 +96,21 @@ The following is a list of built-in LLM in Xinference:
      - 16384
      - deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data.
 
+   * - :ref:`deepseek-prover-v2 <models_llm_deepseek-prover-v2>`
+     - chat, reasoning
+     - 163840
+     - We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model
+
    * - :ref:`deepseek-r1 <models_llm_deepseek-r1>`
      - chat, reasoning
      - 163840
      - DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
 
+   * - :ref:`deepseek-r1-0528 <models_llm_deepseek-r1-0528>`
+     - chat, reasoning
+     - 163840
+     - DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
+
    * - :ref:`deepseek-r1-distill-llama <models_llm_deepseek-r1-distill-llama>`
      - chat, reasoning
      - 131072
@@ -121,11 +121,6 @@ The following is a list of built-in LLM in Xinference:
      - 131072
      - deepseek-r1-distill-qwen is distilled from DeepSeek-R1 based on Qwen
 
-   * - :ref:`deepseek-v2 <models_llm_deepseek-v2>`
-     - generate
-     - 128000
-     - DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. 
-
    * - :ref:`deepseek-v2-chat <models_llm_deepseek-v2-chat>`
      - chat
      - 128000
@@ -146,16 +141,21 @@ The following is a list of built-in LLM in Xinference:
      - 163840
      - DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. 
 
-   * - :ref:`deepseek-vl-chat <models_llm_deepseek-vl-chat>`
-     - chat, vision
-     - 4096
-     - DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.
+   * - :ref:`deepseek-v3-0324 <models_llm_deepseek-v3-0324>`
+     - chat
+     - 163840
+     - DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. 
 
    * - :ref:`deepseek-vl2 <models_llm_deepseek-vl2>`
      - chat, vision
      - 4096
      - DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding.
 
+   * - :ref:`dianjin-r1 <models_llm_dianjin-r1>`
+     - chat, tools
+     - 32768
+     - Tongyi DianJin is a financial intelligence solution platform built by Alibaba Cloud, dedicated to providing financial business developers with a convenient artificial intelligence application development environment.
+
    * - :ref:`fin-r1 <models_llm_fin-r1>`
      - chat
      - 131072
@@ -181,11 +181,6 @@ The following is a list of built-in LLM in Xinference:
      - 8192
      - The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs.
 
-   * - :ref:`glm-edge-v <models_llm_glm-edge-v>`
-     - chat, vision
-     - 8192
-     - The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs.
-
    * - :ref:`glm4-0414 <models_llm_glm4-0414>`
      - chat, tools
      - 32768
@@ -211,6 +206,16 @@ The following is a list of built-in LLM in Xinference:
      - 1024
      - GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes.
 
+   * - :ref:`huatuogpt-o1-llama-3.1 <models_llm_huatuogpt-o1-llama-3.1>`
+     - chat, tools
+     - 131072
+     - HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response.
+
+   * - :ref:`huatuogpt-o1-qwen2.5 <models_llm_huatuogpt-o1-qwen2.5>`
+     - chat, tools
+     - 32768
+     - HuatuoGPT-o1 is a medical LLM designed for advanced medical reasoning. It generates a complex thought process, reflecting and refining its reasoning, before providing a final response.
+
    * - :ref:`internlm3-instruct <models_llm_internlm3-instruct>`
      - chat, tools
      - 32768
@@ -296,11 +301,6 @@ The following is a list of built-in LLM in Xinference:
      - 4096
      - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
 
-   * - :ref:`minicpm-llama3-v-2_5 <models_llm_minicpm-llama3-v-2_5>`
-     - chat, vision
-     - 8192
-     - MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters.
-
    * - :ref:`minicpm-v-2.6 <models_llm_minicpm-v-2.6>`
      - chat, vision
      - 32768
@@ -361,11 +361,6 @@ The following is a list of built-in LLM in Xinference:
      - 8192
      - Kimi Muon is Scalable for LLM Training
 
-   * - :ref:`omnilmm <models_llm_omnilmm>`
-     - chat, vision
-     - 2048
-     - OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling.
-
    * - :ref:`openhermes-2.5 <models_llm_openhermes-2.5>`
      - chat
      - 8192
@@ -411,11 +406,6 @@ The following is a list of built-in LLM in Xinference:
      - 32768
      - Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting.
 
-   * - :ref:`qwen-vl-chat <models_llm_qwen-vl-chat>`
-     - chat, vision
-     - 4096
-     - Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities.
-
    * - :ref:`qwen1.5-chat <models_llm_qwen1.5-chat>`
      - chat, tools
      - 32768
@@ -487,7 +477,7 @@ The following is a list of built-in LLM in Xinference:
      - Qwen2.5-VL: Qwen2.5-VL is the latest version of the vision language models in the Qwen model familities.
 
    * - :ref:`qwen3 <models_llm_qwen3>`
-     - chat, reasoning, tools
+     - chat, reasoning, hybrid, tools
      - 40960
      - Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
 
@@ -526,6 +516,11 @@ The following is a list of built-in LLM in Xinference:
      - 4096
      - Skywork is a series of large models developed by the Kunlun Group · Skywork team.
 
+   * - :ref:`skywork-or1 <models_llm_skywork-or1>`
+     - chat
+     - 131072
+     - We release the final version of Skywork-OR1 (Open Reasoner 1) series of models, including
+
    * - :ref:`skywork-or1-preview <models_llm_skywork-or1-preview>`
      - chat
      - 32768
@@ -551,6 +546,11 @@ The following is a list of built-in LLM in Xinference:
      - 2048
      - WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math.
 
+   * - :ref:`xiyansql-qwencoder-2504 <models_llm_xiyansql-qwencoder-2504>`
+     - chat, tools
+     - 32768
+     - The XiYanSQL-QwenCoder models, as multi-dialect SQL base models, demonstrating robust SQL generation capabilities.
+
    * - :ref:`xverse <models_llm_xverse>`
      - generate
      - 2048
@@ -620,10 +620,6 @@ The following is a list of built-in LLM in Xinference:
   
    cogagent
   
-   cogvlm2
-  
-   cogvlm2-video-llama3-chat
-  
    deepseek
   
    deepseek-chat
@@ -632,14 +628,16 @@ The following is a list of built-in LLM in Xinference:
   
    deepseek-coder-instruct
   
+   deepseek-prover-v2
+  
    deepseek-r1
   
+   deepseek-r1-0528
+  
    deepseek-r1-distill-llama
   
    deepseek-r1-distill-qwen
   
-   deepseek-v2
-  
    deepseek-v2-chat
   
    deepseek-v2-chat-0628
@@ -648,10 +646,12 @@ The following is a list of built-in LLM in Xinference:
   
    deepseek-v3
   
-   deepseek-vl-chat
+   deepseek-v3-0324
   
    deepseek-vl2
   
+   dianjin-r1
+  
    fin-r1
   
    gemma-3-1b-it
@@ -662,8 +662,6 @@ The following is a list of built-in LLM in Xinference:
   
    glm-edge-chat
   
-   glm-edge-v
-  
    glm4-0414
   
    glm4-chat
@@ -674,6 +672,10 @@ The following is a list of built-in LLM in Xinference:
   
    gpt-2
   
+   huatuogpt-o1-llama-3.1
+  
+   huatuogpt-o1-qwen2.5
+  
    internlm3-instruct
   
    internvl3
@@ -708,8 +710,6 @@ The following is a list of built-in LLM in Xinference:
   
    minicpm-2b-sft-fp32
   
-   minicpm-llama3-v-2_5
-  
    minicpm-v-2.6
   
    minicpm3-4b
@@ -734,8 +734,6 @@ The following is a list of built-in LLM in Xinference:
   
    moonlight-16b-a3b-instruct
   
-   omnilmm
-  
    openhermes-2.5
   
    opt
@@ -754,8 +752,6 @@ The following is a list of built-in LLM in Xinference:
   
    qwen-chat
   
-   qwen-vl-chat
-  
    qwen1.5-chat
   
    qwen1.5-moe-chat
@@ -800,6 +796,8 @@ The following is a list of built-in LLM in Xinference:
   
    skywork-math
   
+   skywork-or1
+  
    skywork-or1-preview
   
    telechat
@@ -810,6 +808,8 @@ The following is a list of built-in LLM in Xinference:
   
    wizardmath-v1.0
   
+   xiyansql-qwencoder-2504
+  
    xverse
   
    xverse-chat
diff --git a/doc/source/models/builtin/llm/minicpm-llama3-v-2_5.rst b/doc/source/models/builtin/llm/minicpm-llama3-v-2_5.rst
deleted file mode 100644
index ed2330ad74..0000000000
--- a/doc/source/models/builtin/llm/minicpm-llama3-v-2_5.rst
+++ /dev/null
@@ -1,47 +0,0 @@
-.. _models_llm_minicpm-llama3-v-2_5:
-
-========================================
-MiniCPM-Llama3-V-2_5
-========================================
-
-- **Context Length:** 8192
-- **Model Name:** MiniCPM-Llama3-V-2_5
-- **Languages:** en, zh
-- **Abilities:** chat, vision
-- **Description:** MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters.
-
-Specifications
-^^^^^^^^^^^^^^
-
-
-Model Spec 1 (pytorch, 8 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 8
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** openbmb/MiniCPM-Llama3-V-2_5
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5>`__, `ModelScope <https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-{quantization}>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name MiniCPM-Llama3-V-2_5 --size-in-billions 8 --model-format pytorch --quantization ${quantization}
-
-
-Model Spec 2 (pytorch, 8 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 8
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** openbmb/MiniCPM-Llama3-V-2_5-{quantization}
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-{quantization}>`__, `ModelScope <https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-{quantization}>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name MiniCPM-Llama3-V-2_5 --size-in-billions 8 --model-format pytorch --quantization ${quantization}
-
diff --git a/doc/source/models/builtin/llm/omnilmm.rst b/doc/source/models/builtin/llm/omnilmm.rst
deleted file mode 100644
index c8a0a32226..0000000000
--- a/doc/source/models/builtin/llm/omnilmm.rst
+++ /dev/null
@@ -1,47 +0,0 @@
-.. _models_llm_omnilmm:
-
-========================================
-OmniLMM
-========================================
-
-- **Context Length:** 2048
-- **Model Name:** OmniLMM
-- **Languages:** en, zh
-- **Abilities:** chat, vision
-- **Description:** OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling.
-
-Specifications
-^^^^^^^^^^^^^^
-
-
-Model Spec 1 (pytorch, 3 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 3
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** openbmb/MiniCPM-V
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/openbmb/MiniCPM-V>`__, `ModelScope <https://modelscope.cn/models/OpenBMB/MiniCPM-V>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name OmniLMM --size-in-billions 3 --model-format pytorch --quantization ${quantization}
-
-
-Model Spec 2 (pytorch, 12 Billion)
-++++++++++++++++++++++++++++++++++++++++
-
-- **Model Format:** pytorch
-- **Model Size (in billions):** 12
-- **Quantizations:** none
-- **Engines**: Transformers
-- **Model ID:** openbmb/OmniLMM-12B
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/openbmb/OmniLMM-12B>`__, `ModelScope <https://modelscope.cn/models/OpenBMB/OmniLMM-12B>`__
-
-Execute the following command to launch the model, remember to replace ``${quantization}`` with your
-chosen quantization method from the options listed above::
-
-   xinference launch --model-engine ${engine} --model-name OmniLMM --size-in-billions 12 --model-format pytorch --quantization ${quantization}
-
diff --git a/doc/source/user_guide/backends.rst b/doc/source/user_guide/backends.rst
index e16816db7e..c87782c76c 100644
--- a/doc/source/user_guide/backends.rst
+++ b/doc/source/user_guide/backends.rst
@@ -99,7 +99,7 @@ Currently, supported model includes:
 - ``codestral-v0.1``
 - ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``
 - ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
-- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-r1``, ``deepseek-r1-distill-llama``
+- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-prover-v2``, ``deepseek-r1-distill-llama``
 - ``yi-coder``, ``yi-coder-chat``
 - ``codeqwen1.5``, ``codeqwen1.5-chat``
 - ``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-instruct``, ``qwen2.5-coder-instruct``, ``qwen2.5-instruct-1m``
@@ -113,11 +113,14 @@ Currently, supported model includes:
 - ``codegeex4``
 - ``qwen1.5-chat``, ``qwen1.5-moe-chat``
 - ``qwen2-instruct``, ``qwen2-moe-instruct``
+- ``XiYanSQL-QwenCoder-2504``
 - ``QwQ-32B-Preview``, ``QwQ-32B``
 - ``marco-o1``
 - ``fin-r1``
 - ``seallms-v3``
-- ``skywork-or1-preview``
+- ``skywork-or1-preview``, ``skywork-or1``
+- ``HuatuoGPT-o1-Qwen2.5``, ``HuatuoGPT-o1-LLaMA-3.1``
+- ``DianJin-R1``
 - ``gemma-it``, ``gemma-2-it``, ``gemma-3-1b-it``
 - ``orion-chat``, ``orion-chat-rag``
 - ``c4ai-command-r-v01``
@@ -125,7 +128,6 @@ Currently, supported model includes:
 - ``internlm3-instruct``
 - ``moonlight-16b-a3b-instruct``
 - ``qwen3``
-
 .. vllm_end
 
 .. _sglang_backend:
diff --git a/xinference/model/llm/llm_family.json b/xinference/model/llm/llm_family.json
index 7a4a57a7e2..4b406a50fa 100644
--- a/xinference/model/llm/llm_family.json
+++ b/xinference/model/llm/llm_family.json
@@ -6462,6 +6462,44 @@
       "<｜end▁of▁sentence｜>"
     ]
   },
+  {
+    "version": 1,
+    "context_length": 163840,
+    "model_name": "deepseek-v3-0324",
+    "model_lang": [
+      "en",
+      "zh"
+    ],
+    "model_ability": [
+      "chat"
+    ],
+    "model_description": "DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. ",
+    "model_specs": [
+      {
+        "model_format": "pytorch",
+        "model_size_in_billions": 671,
+        "quantizations": [
+          "none"
+        ],
+        "model_id": "deepseek-ai/DeepSeek-V3-0324"
+      },
+      {
+        "model_format": "awq",
+        "model_size_in_billions": 671,
+        "quantizations": [
+          "Int4"
+        ],
+        "model_id": "cognitivecomputations/DeepSeek-V3-0324-AWQ"
+      }
+    ],
+    "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<｜User｜>' + message['content'] + '<｜Assistant｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- else %}{{message['content'] + '<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- endfor %}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}",
+    "stop_token_ids": [
+      1
+    ],
+    "stop": [
+      "<｜end▁of▁sentence｜>"
+    ]
+  },
   {
     "version": 1,
     "context_length": 163840,
@@ -6678,6 +6716,88 @@
     "reasoning_start_tag": "<think>",
     "reasoning_end_tag": "</think>"
   },
+  {
+    "version": 1,
+    "context_length": 163840,
+    "model_name": "deepseek-r1-0528",
+    "model_lang": [
+      "en",
+      "zh"
+    ],
+    "model_ability": [
+      "chat",
+      "reasoning"
+    ],
+    "model_description": "DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.",
+    "model_specs": [
+      {
+        "model_format": "pytorch",
+        "model_size_in_billions": 671,
+        "quantizations": [
+          "none"
+        ],
+        "model_id": "deepseek-ai/DeepSeek-R1-0528"
+      }
+    ],
+    "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\\n\\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- else %}{{'<｜Assistant｜>' + message['content'] + '<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- endfor %}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}",
+    "stop_token_ids": [
+      1
+    ],
+    "stop": [
+      "<｜end▁of▁sentence｜>"
+    ],
+    "reasoning_start_tag": "<think>",
+    "reasoning_end_tag": "</think>"
+  },
+  {
+    "version": 1,
+    "context_length": 163840,
+    "model_name": "deepseek-prover-v2",
+    "model_lang": [
+      "en",
+      "zh"
+    ],
+    "model_ability": [
+      "chat",
+      "reasoning"
+    ],
+    "model_description": "We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model",
+    "model_specs": [
+      {
+        "model_format": "pytorch",
+        "model_size_in_billions": 671,
+        "quantizations": [
+          "none"
+        ],
+        "model_id": "deepseek-ai/DeepSeek-Prover-V2-671B"
+      },
+      {
+        "model_format": "pytorch",
+        "model_size_in_billions": 7,
+        "quantizations": [
+          "none"
+        ],
+        "model_id": "deepseek-ai/DeepSeek-Prover-V2-7B"
+      },
+      {
+        "model_format": "mlx",
+        "model_size_in_billions": 7,
+        "quantizations": [
+          "4bit"
+        ],
+        "model_id": "mlx-community/DeepSeek-Prover-V2-7B-4bit"
+      }
+    ],
+    "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<｜User｜>' + message['content'] + '<｜Assistant｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- else %}{{message['content'] + '<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- endfor %}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}",
+    "stop_token_ids": [
+      1
+    ],
+    "stop": [
+      "<｜end▁of▁sentence｜>"
+    ],
+    "reasoning_start_tag": "<think>",
+    "reasoning_end_tag": "</think>"
+  },
   {
     "version": 1,
     "context_length": 32768,
diff --git a/xinference/model/llm/llm_family_modelscope.json b/xinference/model/llm/llm_family_modelscope.json
index 0d2482f329..51c104f26c 100644
--- a/xinference/model/llm/llm_family_modelscope.json
+++ b/xinference/model/llm/llm_family_modelscope.json
@@ -4600,6 +4600,46 @@
       "<｜end▁of▁sentence｜>"
     ]
   },
+  {
+    "version": 1,
+    "context_length": 163840,
+    "model_name": "deepseek-v3-0324",
+    "model_lang": [
+      "en",
+      "zh"
+    ],
+    "model_ability": [
+      "chat"
+    ],
+    "model_description": "DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. ",
+    "model_specs": [
+      {
+        "model_format": "pytorch",
+        "model_size_in_billions": 671,
+        "quantizations": [
+          "none"
+        ],
+        "model_id": "deepseek-ai/DeepSeek-V3-0324",
+        "model_hub": "modelscope"
+      },
+      {
+        "model_format": "awq",
+        "model_size_in_billions": 671,
+        "quantizations": [
+          "Int4"
+        ],
+        "model_id": "cognitivecomputations/DeepSeek-V3-0324-AWQ",
+        "model_hub": "modelscope"
+      }
+    ],
+    "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<｜User｜>' + message['content'] + '<｜Assistant｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- else %}{{message['content'] + '<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- endfor %}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}",
+    "stop_token_ids": [
+      1
+    ],
+    "stop": [
+      "<｜end▁of▁sentence｜>"
+    ]
+  },
   {
     "version": 1,
     "context_length": 163840,
@@ -4821,6 +4861,92 @@
     "reasoning_start_tag": "<think>",
     "reasoning_end_tag": "</think>"
   },
+  {
+    "version": 1,
+    "context_length": 163840,
+    "model_name": "deepseek-r1-0528",
+    "model_lang": [
+      "en",
+      "zh"
+    ],
+    "model_ability": [
+      "chat",
+      "reasoning"
+    ],
+    "model_description": "DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.",
+    "model_specs": [
+      {
+        "model_format": "pytorch",
+        "model_size_in_billions": 671,
+        "quantizations": [
+          "none"
+        ],
+        "model_id": "deepseek-ai/DeepSeek-R1-0528",
+        "model_hub": "modelscope"
+      }
+    ],
+    "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\\n\\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- else %}{{'<｜Assistant｜>' + message['content'] + '<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- endfor %}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}",
+    "stop_token_ids": [
+      1
+    ],
+    "stop": [
+      "<｜end▁of▁sentence｜>"
+    ],
+    "reasoning_start_tag": "<think>",
+    "reasoning_end_tag": "</think>"
+  },
+  {
+    "version": 1,
+    "context_length": 163840,
+    "model_name": "deepseek-prover-v2",
+    "model_lang": [
+      "en",
+      "zh"
+    ],
+    "model_ability": [
+      "chat",
+      "reasoning"
+    ],
+    "model_description": "We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model",
+    "model_specs": [
+      {
+        "model_format": "pytorch",
+        "model_size_in_billions": 671,
+        "quantizations": [
+          "none"
+        ],
+        "model_id": "deepseek-ai/DeepSeek-Prover-V2-671B",
+        "model_hub": "modelscope"
+      },
+      {
+        "model_format": "pytorch",
+        "model_size_in_billions": 7,
+        "quantizations": [
+          "none"
+        ],
+        "model_id": "deepseek-ai/DeepSeek-Prover-V2-7B",
+        "model_hub": "modelscope"
+      },
+      {
+        "model_format": "mlx",
+        "model_size_in_billions": 7,
+        "quantizations": [
+          "4bit"
+        ],
+        "model_id": "mlx-community/DeepSeek-Prover-V2-7B-4bit",
+        "model_hub": "modelscope"
+      }
+    ],
+    "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<｜User｜>' + message['content'] + '<｜Assistant｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- else %}{{message['content'] + '<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- endfor %}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{{content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}",
+    "stop_token_ids": [
+      1
+    ],
+    "stop": [
+      "<｜end▁of▁sentence｜>"
+    ],
+    "reasoning_start_tag": "<think>",
+    "reasoning_end_tag": "</think>"
+  },
   {
     "version": 1,
     "context_length": 32768,
diff --git a/xinference/model/llm/sglang/core.py b/xinference/model/llm/sglang/core.py
index 8955e90185..43711bd2db 100644
--- a/xinference/model/llm/sglang/core.py
+++ b/xinference/model/llm/sglang/core.py
@@ -107,7 +107,10 @@ class SGLANGGenerateConfig(TypedDict, total=False):
     "deepseek-r1-distill-qwen",
     "deepseek-r1-distill-llama",
     "deepseek-v3",
+    "deepseek-v3-0324",
     "deepseek-r1",
+    "deepseek-r1-0528",
+    "deepseek-prover-v2",
     "DianJin-R1",
     "qwen3",
     "HuatuoGPT-o1-Qwen2.5",
diff --git a/xinference/model/llm/vllm/core.py b/xinference/model/llm/vllm/core.py
index 6762fea8d8..ebb6071d1f 100644
--- a/xinference/model/llm/vllm/core.py
+++ b/xinference/model/llm/vllm/core.py
@@ -199,7 +199,10 @@ class VLLMGenerateConfig(TypedDict, total=False):
     VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-v2-chat-0628")
     VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-v2.5")
     VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-v3")
+    VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-v3-0324")
     VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-r1")
+    VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-r1-0528")
+    VLLM_SUPPORTED_CHAT_MODELS.append("deepseek-prover-v2")
 
 if VLLM_INSTALLED and vllm.__version__ >= "0.5.3":
     VLLM_SUPPORTED_CHAT_MODELS.append("gemma-2-it")
diff --git a/xinference/web/ui/src/scenes/launch_model/data/data.js b/xinference/web/ui/src/scenes/launch_model/data/data.js
index 324f7804e1..e630e329f4 100644
--- a/xinference/web/ui/src/scenes/launch_model/data/data.js
+++ b/xinference/web/ui/src/scenes/launch_model/data/data.js
@@ -79,8 +79,8 @@ export const featureModels = [
     type: 'llm',
     feature_models: [
       'qwen3',
-      'deepseek-v3',
-      'deepseek-r1',
+      'deepseek-v3-0324',
+      'deepseek-r1-0528',
       'deepseek-r1-distill-qwen',
       'deepseek-r1-distill-llama',
       'qwen2.5-instruct',