sipeed
diff --git a/‎docs/doc/assets/ssd_car.jpg
67.3 KB b/‎docs/doc/assets/ssd_car.jpg
67.3 KB
diff --git a/‎docs/doc/en/mllm/llm_deepseek.md
Lines changed: 120 additions & 1 deletion b/‎docs/doc/en/mllm/llm_deepseek.md
Lines changed: 120 additions & 1 deletion
diff --git a/‎docs/doc/en/mllm/llm_qwen.md
Lines changed: 91 additions & 0 deletions b/‎docs/doc/en/mllm/llm_qwen.md
Lines changed: 91 additions & 0 deletions
diff --git a/‎docs/doc/en/mllm/vlm_internvl.md
Lines changed: 32 additions & 0 deletions b/‎docs/doc/en/mllm/vlm_internvl.md
Lines changed: 32 additions & 0 deletions
@@ -26,7 +26,18 @@ The 1.5B version is essentially a distilled version based on Qwen2.5. That means
 
 ## Running DeepSeek R1 on MaixPy MaixCAM
 
-As mentioned above, since the network structure is the same as Qwen2.5, please refer to the [Qwen Documentation](./llm_qwen.md). Below is a usage example:
+As mentioned above, since the network structure is the same as Qwen2.5, please refer to the [Qwen Documentation](./llm_qwen.md).
+
+### Models and Download Links
+
+By default, if `/root/models` directory in the system have no model, you can download it manually.
+
+* **1.5B**:
+
+  * Memory requirement: CMM memory 1.8GiB. See [Memory Usage Documentation](../pro/memory.md) for explanation.
+  * Download link: https://huggingface.co/sipeed/deepseek-r1-distill-qwen-1.5B-maixcam2
+
+Download method please refer to [Qwen Documentation](./llm_qwen.md).
 
 ### Running the Model
 
@@ -74,3 +85,111 @@ err.check_raise(resp.err_code)
 # print(resp.msg)
 ```
 
+Result:
+```
+    >> Hello, please introduce yourself.
+    <think>
+    Alright, the user sent "Hello, please introduce yourself." and then added a bot message saying, "You are Qwen, created by Alibaba Cloud."
+
+    I should respond warmly in Chinese. I need let them know I'm here to help with any questions they have and also mention how I can assist them further.
+
+    I should phrase it politely, making sure it's clear and friendly. Also, it's good to offer to help them with whatever they're curious about.
+
+    I think that covers it. I'll keep it concise and positive.
+    </think>
+
+    你好！我是AI助手，由阿里巴巴AI研究有限公司开发，很高兴能为您提供帮助。有什么我可以帮助你的吗？
+
+    >> Please calculate 1990 + 35 and provide the calculation steps.
+    <think>
+    好，让我看看用户的查询。用户给了一个计算题：“请计算1990加35，并提供计算步骤。”看起来用户是一位学生，可能刚刚学过数学，需要练习加法。用户希望得到详细的计算步骤。
+
+    首先，我需要确认用户的具体需求。用户明确要求计算步骤，所以我要仔细检查计算是否正确，以及步骤是否清晰。计算1990加35，首先可以从个位开始相加，然后是十位，再是百位和千位，如果有进位的话。
+
+    我得确保每个步骤都准确无误，确保用户理解每一步是怎么进行的。另外，我要注意数字的排列是否正确，避免笔误。如果用户需要，我可以补充其他数学问题，帮助他们进一步学习。
+
+    总之，我的回答应该包括计算过程的详细说明，确保用户能够一步步理解如何进行加法操作，同时也提供一些额外的信息和可能的补充帮助。
+    </think>
+
+    好的，让我来计算一下1990加35的步骤吧!
+
+    首先，写下这两个数字：
+    ```
+    1990
+    +  35
+    -------
+    ```
+
+    **步骤1：对齐数字**
+    将两个数字对齐，确保它们的位数对齐。也就是说，个位对齐，十位对齐，百位对齐，千位对齐。
+
+    ```
+    1990
+    +   35
+    -------
+    ```
+
+    **步骤2：从个位开始相加**
+    个位的数字是0和5，相加的结果是5。写下这个结果，个位是5。
+
+    ```
+    1990
+    +    35
+    -------
+        5
+    ```
+
+    **步骤3：处理十位**
+    十位的数字是9和3相加，再加上前面个位的进位（但这里没有进位）。9加3等于12。写下2，并将1作为进位。
+
+    ```
+    1990
+    +    35
+    -------
+        25
+    ```
+
+    **步骤4：处理百位**
+    百位的数字是9和0相加，再加上前面的进位（1）。9加0等于9，加上进位的1，等于10。写下0，并将1作为进位到千位。
+
+    ```
+    1990
+    +    35
+    -------
+    125
+    ```
+
+    **步骤5：处理千位**
+    千位的数字是1和0，再加上前面的进位（1）。1加0等于1，加上进位的1，等于2。写下2。
+
+    ```
+    1990
+    +    35
+    -------
+    2025
+    ```
+
+    所以，最终的计算结果是：
+    ```
+    1990 + 35 = 2025
+    ```
+
+    >> please calculate 1990 + 35
+    <think>
+    Okay, so I need to calculate 1,9990 plus 35. Wait, that doesn't seem right. 1,990 sounds like a number with a comma in it, but I'm not sure. Maybe the comma is a thousands separator? So, 1,990 would be 1,990, right? Hmm, that makes more sense. So, I'm supposed to add 1,990 and 35.
+
+    Let me try that again. 1,900 plus 90 is 1,990. Yeah, okay, that's correct. So, 1,990 plus 35 would be adding 35 to 1,990. So, 1,990 plus 30 is 2,020, and then plus 5 makes 2,025. So, the answer should be 2,025. Wait, but I'm not a math expert, so maybe I should double-check that. 1,990 is the same as 1990, right? Yeah, 1,000 plus 990 is 1,990. Adding 35, so 1,990 plus 35 equals 2,025. Yeah, that makes sense.
+    </think>
+
+    The sum of 1,990 and 35 is calculated as follows:
+
+    1,990 + 35 = 2,025.
+
+    **Answer:** 2,025
+```
+
+
+## Modifying Parameters
+
+Refer to [Qwen Documentation](./llm_qwen.md)。
+
@@ -135,6 +135,37 @@ err.check_raise(resp.err_code)
 # print(resp.msg)
 ```
 
+Result:
+```
+>> 你好，请介绍你自己
+你好！我是Qwen，我是一个由阿里云开发的预训练语言模型。我的设计目的是尽可能多地模拟人类语言的复杂性和多样性。虽然我没有个人记忆或情感，但我可以生成连贯和有意义的文本。如果你有任何问题或需要帮助，请告诉我！
+
+>> 请计算 1990 + 35的值，并给出计算过程
+计算的过程如下：
+
+1. 首先，将两个数相加，即 1990 + 35。
+
+2. 将 1990 和 35 对齐数字位数，如下：
+
+   1 9 9 0
+ +   3 5
+-------
+   2 0 2 5
+
+3. 按照从右向左加起来：
+
+   0 + 5 = 5
+   2 + 9 = 11（进一，写 2 进 10）
+   1 + 9 = 10（进一，写 0 进 10）
+   1 + 1 = 2
+
+所以，1990 + 35 的结果是 2025。
+
+>> please calculate 1990 + 35
+1990 + 35 = 2025
+
+```
+
 ### Context
 
 Due to limited resources, the context length is also limited. For example, the default model supports about 512 tokens, and at least 128 free tokens must remain to continue the dialogue. For instance, if the historical tokens reach 500 (which is less than 512 but not enough free tokens), further dialogue is not possible.
@@ -143,6 +174,66 @@ When the context is full, you currently must call `clear_context()` to clear the
 
 Of course, this context length can be modified, but doing so requires re-quantizing the model. Also, longer context length can slow down model performance. If needed, you can convert the model yourself as described below.
 
+## Modifying Parameters
+
+The Qwen model allows certain parameters to be modified, which can change the model's behavior. Default values are typically set within the `model.mud` file, 
+Of course, you can also set these values programmatically, for example:
+
+```python
+qwen.post_config.temperature = 0.9
+```
+
+configs for example:
+
+```ini
+[post_config]
+enable_temperature = true
+temperature = 0.9
+
+enable_repetition_penalty = false
+repetition_penalty = 1.2
+penalty_window = 20
+
+enable_top_p_sampling = false
+top_p = 0.8
+
+enable_top_k_sampling = true
+top_k = 10
+```
+
+These parameters are used to **control the text generation behavior** of the Qwen model (or other large language models) through sampling strategies. They affect the **diversity, randomness, and repetition** of the output. Below is an explanation of each parameter:
+
+* `enable_temperature = true`
+* `temperature = 0.9`
+  * **Meaning**: Enables the "temperature sampling" strategy and sets the temperature value to 0.9.
+  * **Explanation**:
+    * Temperature controls **randomness**. Lower values (e.g., 0.1) result in more deterministic outputs (similar to greedy search), while higher values (e.g., 1.5) increase randomness.
+    * A recommended range is typically between `0.7 ~ 1.0`.
+    * A value of 0.9 means a moderate increase in diversity without making the output too chaotic.
+* `enable_repetition_penalty = false`
+* `repetition_penalty = 1.2`
+* `penalty_window = 20`
+  * **Meaning**:
+    * Repetition penalty is disabled, so the value `repetition_penalty = 1.2` has no effect.
+    * If enabled, this mechanism reduces the probability of repeating tokens from the most recent `20` tokens.
+  * **Explanation**:
+    * Helps prevent the model from being verbose or getting stuck in repetitive loops (e.g., “hello hello hello…”).
+    * A penalty factor > 1 suppresses repetition. A common recommended range is `1.1 ~ 1.3`.
+* `enable_top_p_sampling = false`
+* `top_p = 0.8`
+  * **Meaning**:
+    * Top-p (nucleus) sampling is disabled.
+    * If enabled, the model samples from **the smallest set of tokens whose cumulative probability exceeds p**, instead of from all tokens.
+  * **Explanation**:
+    * `top_p = 0.8` means sampling from tokens whose cumulative probability just reaches 0.8.
+    * More flexible than top-k, as it adapts the candidate set dynamically based on the token distribution in each generation step.
+* `enable_top_k_sampling = true`
+* `top_k = 10`
+  * **Meaning**: Enables top-k sampling, where the model selects output tokens from the **top 10 most probable tokens**.
+  * **Explanation**:
+    * This is a way to constrain the sampling space and control output diversity.
+    * `top_k = 1` approximates greedy search (most deterministic), while `top_k = 10` allows for a moderate level of diversity.
+
 ## Custom Quantized Models
 
 The models provided above are quantized specifically for MaixCAM2. If you need to quantize your own models, refer to:
 
@@ -17,9 +17,28 @@ update:
 ## Introduction to InternVL
 
 VLM (Vision-Language Model) refers to a vision-language model that allows AI to generate text output based on both text and image input, such as describing the content of an image, meaning the AI has learned to interpret images.
+InternVL supports multiple languages, such as Chinese and English.
+
 
 MaixPy has integrated [InternVL2.5](https://huggingface.co/OpenGVLab/InternVL2_5-1B), which is based on Qwen2.5 with added image support. Therefore, some basic concepts are not detailed here. It is recommended to read the [Qwen](./llm_qwen.md) introduction first.
 
+
+For example, with this image, using the system prompt
+`你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型，英文名叫InternVL, 是一个有用无害的人工智能助手。`
+and the user prompt
+`Describe the picture`
+on MaixCAM2 with InternVL2.5-1B (which is also the result of the code below):
+![ssd\_car.jpg](../../assets/ssd_car.jpg)
+```
+>> 请描述图中有什么
+图中有一个红色双层巴士停在马路上，前面是一辆黑色的小轿车。一位穿黑色夹克的人站在巴士前面，脸上带着微笑。背景是城市建筑，有商店和多幅广告牌。路上的画面上有一个行人图案。
+
+>> Describe the picture
+In the image, we see a vibrant street scene featuring a classic double-decker bus in red with "Things Get New Look!" written on its side. It’s parked on the street, where a woman stands smiling at the camera. Behind the bus, a row of classic buildings with large windows lines the street, contributing to the urban atmosphere. A black van is parked nearby, and there are a few people and street signs indicating traffic regulations. The overall scene captures a typical day in a historic city.
+```
+This is the result with a casually set prompt. You can adjust the system and user prompts according to the actual situation.
+
+
 ## Using InternVL in MaixPy MaixCAM
 
 ### Model and Download Link
@@ -110,6 +129,15 @@ err.check_raise(resp.err_code)
 # print(resp.msg)
 ```
 
+Result:
+```
+>> 请描述图中有什么
+图中有一个红色双层巴士停在马路上，前面是一辆黑色的小轿车。一位穿黑色夹克的人站在巴士前面，脸上带着微笑。背景是城市建筑，有商店和多幅广告牌。路上的画面上有一个行人图案。
+
+>> Describe the picture
+In the image, we see a vibrant street scene featuring a classic double-decker bus in red with "Things Get New Look!" written on its side. It’s parked on the street, where a woman stands smiling at the camera. Behind the bus, a row of classic buildings with large windows lines the street, contributing to the urban atmosphere. A black van is parked nearby, and there are a few people and street signs indicating traffic regulations. The overall scene captures a typical day in a historic city.
+```
+
 This loads an image from the system and asks the model to describe what’s in the image. Note that this model **does not support context**, meaning each call to the `send` function is a brand-new conversation and does not remember the content from previous `send` calls.
 
 Additionally, the default model supports image input resolution of `364 x 364`. So when calling `set_image`, if the resolution doesn't match, it will automatically call `img.resize` to resize the image using the method specified by `fit`, such as `image.Fit.FIT_CONTAIN`, which resizes while maintaining the original aspect ratio and fills the surrounding space with black.
@@ -118,6 +146,10 @@ Additionally, the default model supports image input resolution of `364 x 364`.
 
 Note that the length of the input text after being tokenized is limited. For example, the default 1B model supports 256 tokens, and the total tokens for input and output should not exceed 1023.
 
+## Modifying Parameters
+
+Refer to [Qwen Documentation](./llm_qwen.md)。
+
 ## Custom Quantized Model
 
 The model provided above is a quantized model for MaixCAM2. If you want to quantize your own model, refer to: