You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/doc/en/mllm/llm_deepseek.md
+120-1Lines changed: 120 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,18 @@ The 1.5B version is essentially a distilled version based on Qwen2.5. That means
26
26
27
27
## Running DeepSeek R1 on MaixPy MaixCAM
28
28
29
-
As mentioned above, since the network structure is the same as Qwen2.5, please refer to the [Qwen Documentation](./llm_qwen.md). Below is a usage example:
29
+
As mentioned above, since the network structure is the same as Qwen2.5, please refer to the [Qwen Documentation](./llm_qwen.md).
30
+
31
+
### Models and Download Links
32
+
33
+
By default, if `/root/models` directory in the system have no model, you can download it manually.
34
+
35
+
***1.5B**:
36
+
37
+
* Memory requirement: CMM memory 1.8GiB. See [Memory Usage Documentation](../pro/memory.md) for explanation.
Okay, so I need to calculate 1,9990 plus 35. Wait, that doesn't seem right. 1,990 sounds like a number with a comma in it, but I'm not sure. Maybe the comma is a thousands separator? So, 1,990 would be 1,990, right? Hmm, that makes more sense. So, I'm supposed to add 1,990 and 35.
180
+
181
+
Let me try that again. 1,900 plus 90 is 1,990. Yeah, okay, that's correct. So, 1,990 plus 35 would be adding 35 to 1,990. So, 1,990 plus 30 is 2,020, and then plus 5 makes 2,025. So, the answer should be 2,025. Wait, but I'm not a math expert, so maybe I should double-check that. 1,990 is the same as 1990, right? Yeah, 1,000 plus 990 is 1,990. Adding 35, so 1,990 plus 35 equals 2,025. Yeah, that makes sense.
Due to limited resources, the context length is also limited. For example, the default model supports about 512 tokens, and at least 128 free tokens must remain to continue the dialogue. For instance, if the historical tokens reach 500 (which is less than 512 but not enough free tokens), further dialogue is not possible.
@@ -143,6 +174,66 @@ When the context is full, you currently must call `clear_context()` to clear the
143
174
144
175
Of course, this context length can be modified, but doing so requires re-quantizing the model. Also, longer context length can slow down model performance. If needed, you can convert the model yourself as described below.
145
176
177
+
## Modifying Parameters
178
+
179
+
The Qwen model allows certain parameters to be modified, which can change the model's behavior. Default values are typically set within the `model.mud` file,
180
+
Of course, you can also set these values programmatically, for example:
181
+
182
+
```python
183
+
qwen.post_config.temperature =0.9
184
+
```
185
+
186
+
configs for example:
187
+
188
+
```ini
189
+
[post_config]
190
+
enable_temperature = true
191
+
temperature = 0.9
192
+
193
+
enable_repetition_penalty = false
194
+
repetition_penalty = 1.2
195
+
penalty_window = 20
196
+
197
+
enable_top_p_sampling = false
198
+
top_p = 0.8
199
+
200
+
enable_top_k_sampling = true
201
+
top_k = 10
202
+
```
203
+
204
+
These parameters are used to **control the text generation behavior** of the Qwen model (or other large language models) through sampling strategies. They affect the **diversity, randomness, and repetition** of the output. Below is an explanation of each parameter:
205
+
206
+
*`enable_temperature = true`
207
+
*`temperature = 0.9`
208
+
***Meaning**: Enables the "temperature sampling" strategy and sets the temperature value to 0.9.
209
+
***Explanation**:
210
+
* Temperature controls **randomness**. Lower values (e.g., 0.1) result in more deterministic outputs (similar to greedy search), while higher values (e.g., 1.5) increase randomness.
211
+
* A recommended range is typically between `0.7 ~ 1.0`.
212
+
* A value of 0.9 means a moderate increase in diversity without making the output too chaotic.
213
+
*`enable_repetition_penalty = false`
214
+
*`repetition_penalty = 1.2`
215
+
*`penalty_window = 20`
216
+
***Meaning**:
217
+
* Repetition penalty is disabled, so the value `repetition_penalty = 1.2` has no effect.
218
+
* If enabled, this mechanism reduces the probability of repeating tokens from the most recent `20` tokens.
219
+
***Explanation**:
220
+
* Helps prevent the model from being verbose or getting stuck in repetitive loops (e.g., “hello hello hello…”).
221
+
* A penalty factor > 1 suppresses repetition. A common recommended range is `1.1 ~ 1.3`.
222
+
*`enable_top_p_sampling = false`
223
+
*`top_p = 0.8`
224
+
***Meaning**:
225
+
* Top-p (nucleus) sampling is disabled.
226
+
* If enabled, the model samples from **the smallest set of tokens whose cumulative probability exceeds p**, instead of from all tokens.
227
+
***Explanation**:
228
+
*`top_p = 0.8` means sampling from tokens whose cumulative probability just reaches 0.8.
229
+
* More flexible than top-k, as it adapts the candidate set dynamically based on the token distribution in each generation step.
230
+
*`enable_top_k_sampling = true`
231
+
*`top_k = 10`
232
+
***Meaning**: Enables top-k sampling, where the model selects output tokens from the **top 10 most probable tokens**.
233
+
***Explanation**:
234
+
* This is a way to constrain the sampling space and control output diversity.
235
+
*`top_k = 1` approximates greedy search (most deterministic), while `top_k = 10` allows for a moderate level of diversity.
236
+
146
237
## Custom Quantized Models
147
238
148
239
The models provided above are quantized specifically for MaixCAM2. If you need to quantize your own models, refer to:
Copy file name to clipboardExpand all lines: docs/doc/en/mllm/vlm_internvl.md
+32Lines changed: 32 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -17,9 +17,28 @@ update:
17
17
## Introduction to InternVL
18
18
19
19
VLM (Vision-Language Model) refers to a vision-language model that allows AI to generate text output based on both text and image input, such as describing the content of an image, meaning the AI has learned to interpret images.
20
+
InternVL supports multiple languages, such as Chinese and English.
21
+
20
22
21
23
MaixPy has integrated [InternVL2.5](https://huggingface.co/OpenGVLab/InternVL2_5-1B), which is based on Qwen2.5 with added image support. Therefore, some basic concepts are not detailed here. It is recommended to read the [Qwen](./llm_qwen.md) introduction first.
22
24
25
+
26
+
For example, with this image, using the system prompt
In the image, we see a vibrant street scene featuring a classic double-decker bus in red with "Things Get New Look!" written on its side. It’s parked on the street, where a woman stands smiling at the camera. Behind the bus, a row of classic buildings with large windows lines the street, contributing to the urban atmosphere. A black van is parked nearby, and there are a few people and street signs indicating traffic regulations. The overall scene captures a typical day in a historic city.
38
+
```
39
+
This is the result with a casually set prompt. You can adjust the system and user prompts according to the actual situation.
In the image, we see a vibrant street scene featuring a classic double-decker bus in red with "Things Get New Look!" written on its side. It’s parked on the street, where a woman stands smiling at the camera. Behind the bus, a row of classic buildings with large windows lines the street, contributing to the urban atmosphere. A black van is parked nearby, and there are a few people and street signs indicating traffic regulations. The overall scene captures a typical day in a historic city.
139
+
```
140
+
113
141
This loads an image from the system and asks the model to describe what’s in the image. Note that this model **does not support context**, meaning each call to the `send` function is a brand-new conversation and does not remember the content from previous `send` calls.
114
142
115
143
Additionally, the default model supports image input resolution of `364 x 364`. So when calling `set_image`, if the resolution doesn't match, it will automatically call `img.resize` to resize the image using the method specified by `fit`, such as `image.Fit.FIT_CONTAIN`, which resizes while maintaining the original aspect ratio and fills the surrounding space with black.
@@ -118,6 +146,10 @@ Additionally, the default model supports image input resolution of `364 x 364`.
118
146
119
147
Note that the length of the input text after being tokenized is limited. For example, the default 1B model supports 256 tokens, and the total tokens for input and output should not exceed 1023.
120
148
149
+
## Modifying Parameters
150
+
151
+
Refer to [Qwen Documentation](./llm_qwen.md)。
152
+
121
153
## Custom Quantized Model
122
154
123
155
The model provided above is a quantized model for MaixCAM2. If you want to quantize your own model, refer to:
0 commit comments