You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`Remote` select to provide remote access to the LLM
15
+
-`Port` port to run the LLM server (if `Remote` is set)
16
+
-`Num Threads` number of threads to use (default: -1 = all)
17
+
-`Num GPU Layers` number of model layers to offload to the GPU.
18
+
If set to 0 the GPU is not used. Use a large number i.e. >30 to utilise the GPU as much as possible.
19
+
Note that higher values of context size will use more VRAM.
20
+
If the user's GPU is not supported, the LLM will fall back to the CPU
21
+
-`Debug` select to log the output of the model in the Unity Editor
22
+
- <details><summary>Advanced options</summary>
23
+
24
+
- <details><summary><code>Parallel Prompts</code> number of prompts / slots that can happen in parallel (default: -1 = number of LLMCharacter objects). Note that the context size is divided among the slots.</summary> If you want to retain as much context for the LLM and don't need all the characters present at the same time, you can set this number and specify the slot for each LLMCharacter object.
25
+
e.g. Setting `Parallel Prompts` to 1 and slot 0 for all LLMCharacter objects will use the full context, but the entire prompt will need to be computed (no caching) whenever a LLMCharacter object is used for chat. </details>
26
+
-`Dont Destroy On Load` select to not destroy the LLM GameObject when loading a new Scene
27
+
28
+
</details>
29
+
30
+
## Server Security Settings
31
+
32
+
-`API key` API key to use to allow access to requests from LLMCharacter objects (if `Remote` is set)
33
+
- <details><summary>Advanced options</summary>
34
+
35
+
-`Load SSL certificate` allows to load a SSL certificate for end-to-end encryption of requests (if `Remote` is set). Requires SSL key as well.
36
+
-`Load SSL key` allows to load a SSL key for end-to-end encryption of requests (if `Remote` is set). Requires SSL certificate as well.
37
+
-`SSL certificate path` the SSL certificate used for end-to-end encryption of requests (if `Remote` is set).
38
+
-`SSL key path` the SSL key used for end-to-end encryption of requests (if `Remote` is set).
39
+
40
+
</details>
41
+
42
+
## 🤗 Model Settings
43
+
-`Download model` click to download one of the default models
44
+
-`Load model` click to load your own model in .gguf format
45
+
-`Download on Start` enable to downloaded the LLM models the first time the game starts. Alternatively the LLM models wil be copied directly in the build
46
+
- <details><summary><code>Context Size</code> size of the prompt context (0 = context size of the model)</summary> This is the number of tokens the model can take as input when generating responses. Higher values use more RAM or VRAM (if using GPU). </details>
47
+
48
+
- <details><summary>Advanced options</summary>
49
+
50
+
-`Download lora` click to download a LoRA model in .gguf format
51
+
-`Load lora` click to load a LoRA model in .gguf format
52
+
-`Batch Size` batch size for prompt processing (default: 512)
53
+
-`Model` the path of the model being used (relative to the Assets/StreamingAssets folder)
54
+
-`Chat Template` the chat template being used for the LLM
55
+
-`Lora` the path of the LoRAs being used (relative to the Assets/StreamingAssets folder)
56
+
-`Lora Weights` the weights of the LoRAs being used
57
+
-`Flash Attention` click to use flash attention in the model (if `Use extras` is enabled)
58
+
59
+
</details>
60
+
61
+
## LLMCharacter Settings
62
+
63
+
-`Show/Hide Advanced Options` Toggle to show/hide advanced options from below
64
+
-`Log Level` select how verbose the log messages are
65
+
-`Use extras` select to install and allow the use of extra features (flash attention and IQ quants)
-`LLM` the LLM GameObject (if `Remote` is not set)
74
+
-`Hort` ip of the LLM server (if `Remote` is set)
75
+
-`Port` port of the LLM server (if `Remote` is set)
76
+
-`Num Retries` number of HTTP request retries from the LLM server (if `Remote` is set)
77
+
-`API key` API key of the LLM server (if `Remote` is set)
78
+
- <details><summary><code>Save</code> save filename or relative path</summary> If set, the chat history and LLM state (if save cache is enabled) is automatically saved to file specified. <br> The chat history is saved with a json suffix and the LLM state with a cache suffix. <br> Both files are saved in the [persistentDataPath folder of Unity](https://docs.unity3d.com/ScriptReference/Application-persistentDataPath.html).</details>
79
+
-`Save Cache` select to save the LLM state along with the chat history. The LLM state is typically around 100MB+.
80
+
-`Debug Prompt` select to log the constructed prompts in the Unity Editor
81
+
82
+
## 🗨️ Chat Settings
83
+
-`Player Name` the name of the player
84
+
-`AI Name` the name of the AI
85
+
-`Prompt` description of the AI role
86
+
87
+
## 🤗 Model Settings
88
+
-`Stream` select to receive the reply from the model as it is produced (recommended!).<br>
89
+
If it is not selected, the full reply from the model is received in one go
90
+
- <details><summary><code>Num Predict</code> maximum number of tokens to predict (default: 256, -1 = infinity, -2 = until context filled)</summary>This is the maximum amount of tokens the model will maximum predict. When N tokens are reached the model will stop generating. This means words / sentences might not get finished if this is too low. </details>
91
+
92
+
- <details><summary>Advanced options</summary>
93
+
94
+
-`Load grammar` click to load a grammar in .gbnf format
95
+
-`Grammar` the path of the grammar being used (relative to the Assets/StreamingAssets folder)
96
+
- <details><summary><code>Cache Prompt</code> save the ongoing prompt from the chat (default: true)</summary> Saves the prompt while it is being created by the chat to avoid reprocessing the entire prompt every time</details>
97
+
-`Slot` slot of the server to use for computation. Value can be set from 0 to `Parallel Prompts`-1 (default: -1 = new slot for each character)
98
+
-`Seed` seed for reproducibility. For random results every time use -1
99
+
- <details><summary><code>Temperature</code> LLM temperature, lower values give more deterministic answers (default: 0.2)</summary>The temperature setting adjusts how random the generated responses are. Turning it up makes the generated choices more varied and unpredictable. Turning it down makes the generated responses more predictable and focused on the most likely options.</details>
100
+
- <details><summary><code>Top K</code> top-k sampling (default: 40, 0 = disabled)</summary>The top k value controls the top k most probable tokens at each step of generation. This value can help fine tune the output and make this adhere to specific patterns or constraints.</details>
101
+
- <details><summary><code>Top P</code> top-p sampling (default: 0.9, 1.0 = disabled)</summary>The top p value controls the cumulative probability of generated tokens. The model will generate tokens until this theshold (p) is reached. By lowering this value you can shorten output & encourage / discourage more diverse outputs.</details>
102
+
- <details><summary><code>Min P</code> minimum probability for a token to be used (default: 0.05)</summary> The probability is defined relative to the probability of the most likely token.</details>
103
+
- <details><summary><code>Repeat Penalty</code> control the repetition of token sequences in the generated text (default: 1.1)</summary>The penalty is applied to repeated tokens.</details>
104
+
- <details><summary><code>Presence Penalty</code> repeated token presence penalty (default: 0.0, 0.0 = disabled)</summary> Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.</details>
105
+
- <details><summary><code>Frequency Penalty</code> repeated token frequency penalty (default: 0.0, 0.0 = disabled)</summary> Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.</details>
106
+
-`Typical P`: enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).
107
+
-`Repeat Last N`: last N tokens to consider for penalizing repetition (default: 64, 0 = disabled, -1 = ctx-size).
108
+
-`Penalize Nl`: penalize newline tokens when applying the repeat penalty (default: true).
109
+
-`Penalty Prompt`: prompt for the purpose of the penalty evaluation. Can be either `null`, a string or an array of numbers representing tokens (default: `null` = use original `prompt`).
-`Remote` select to provide remote access to the LLM
512
-
-`Port` port to run the LLM server (if `Remote` is set)
513
-
-`Num Threads` number of threads to use (default: -1 = all)
514
-
-`Num GPU Layers` number of model layers to offload to the GPU.
515
-
If set to 0 the GPU is not used. Use a large number i.e. >30 to utilise the GPU as much as possible.
516
-
Note that higher values of context size will use more VRAM.
517
-
If the user's GPU is not supported, the LLM will fall back to the CPU
518
-
-`Debug` select to log the output of the model in the Unity Editor
519
-
- <details><summary>Advanced options</summary>
520
-
521
-
- <details><summary><code>Parallel Prompts</code> number of prompts / slots that can happen in parallel (default: -1 = number of LLMCharacter objects). Note that the context size is divided among the slots.</summary> If you want to retain as much context for the LLM and don't need all the characters present at the same time, you can set this number and specify the slot for each LLMCharacter object.
522
-
e.g. Setting `Parallel Prompts` to 1 and slot 0 for all LLMCharacter objects will use the full context, but the entire prompt will need to be computed (no caching) whenever a LLMCharacter object is used for chat. </details>
523
-
-`Dont Destroy On Load` select to not destroy the LLM GameObject when loading a new Scene
524
-
525
-
</details>
526
-
527
-
### Server Security Settings
528
-
529
-
-`API key` API key to use to allow access to requests from LLMCharacter objects (if `Remote` is set)
530
-
- <details><summary>Advanced options</summary>
531
-
532
-
-`Load SSL certificate` allows to load a SSL certificate for end-to-end encryption of requests (if `Remote` is set). Requires SSL key as well.
533
-
-`Load SSL key` allows to load a SSL key for end-to-end encryption of requests (if `Remote` is set). Requires SSL certificate as well.
534
-
-`SSL certificate path` the SSL certificate used for end-to-end encryption of requests (if `Remote` is set).
535
-
-`SSL key path` the SSL key used for end-to-end encryption of requests (if `Remote` is set).
536
-
537
-
</details>
538
-
539
-
#### 🤗 Model Settings
540
-
-`Download model` click to download one of the default models
541
-
-`Load model` click to load your own model in .gguf format
542
-
-`Download on Start` enable to downloaded the LLM models the first time the game starts. Alternatively the LLM models wil be copied directly in the build
543
-
- <details><summary><code>Context Size</code> size of the prompt context (0 = context size of the model)</summary> This is the number of tokens the model can take as input when generating responses. Higher values use more RAM or VRAM (if using GPU). </details>
544
-
545
-
- <details><summary>Advanced options</summary>
546
-
547
-
-`Download lora` click to download a LoRA model in .gguf format
548
-
-`Load lora` click to load a LoRA model in .gguf format
549
-
-`Batch Size` batch size for prompt processing (default: 512)
550
-
-`Model` the path of the model being used (relative to the Assets/StreamingAssets folder)
551
-
-`Chat Template` the chat template being used for the LLM
552
-
-`Lora` the path of the LoRAs being used (relative to the Assets/StreamingAssets folder)
553
-
-`Lora Weights` the weights of the LoRAs being used
554
-
-`Flash Attention` click to use flash attention in the model (if `Use extras` is enabled)
555
-
556
-
</details>
557
-
558
-
### LLMCharacter Settings
559
-
560
-
-`Show/Hide Advanced Options` Toggle to show/hide advanced options from below
561
-
-`Log Level` select how verbose the log messages are
562
-
-`Use extras` select to install and allow the use of extra features (flash attention and IQ quants)
-`LLM` the LLM GameObject (if `Remote` is not set)
571
-
-`Hort` ip of the LLM server (if `Remote` is set)
572
-
-`Port` port of the LLM server (if `Remote` is set)
573
-
-`Num Retries` number of HTTP request retries from the LLM server (if `Remote` is set)
574
-
-`API key` API key of the LLM server (if `Remote` is set)
575
-
- <details><summary><code>Save</code> save filename or relative path</summary> If set, the chat history and LLM state (if save cache is enabled) is automatically saved to file specified. <br> The chat history is saved with a json suffix and the LLM state with a cache suffix. <br> Both files are saved in the [persistentDataPath folder of Unity](https://docs.unity3d.com/ScriptReference/Application-persistentDataPath.html).</details>
576
-
-`Save Cache` select to save the LLM state along with the chat history. The LLM state is typically around 100MB+.
577
-
-`Debug Prompt` select to log the constructed prompts in the Unity Editor
578
-
579
-
#### 🗨️ Chat Settings
580
-
-`Player Name` the name of the player
581
-
-`AI Name` the name of the AI
582
-
-`Prompt` description of the AI role
583
-
584
-
#### 🤗 Model Settings
585
-
-`Stream` select to receive the reply from the model as it is produced (recommended!).<br>
586
-
If it is not selected, the full reply from the model is received in one go
587
-
- <details><summary><code>Num Predict</code> maximum number of tokens to predict (default: 256, -1 = infinity, -2 = until context filled)</summary>This is the maximum amount of tokens the model will maximum predict. When N tokens are reached the model will stop generating. This means words / sentences might not get finished if this is too low. </details>
588
-
589
-
- <details><summary>Advanced options</summary>
590
-
591
-
-`Load grammar` click to load a grammar in .gbnf format
592
-
-`Grammar` the path of the grammar being used (relative to the Assets/StreamingAssets folder)
593
-
- <details><summary><code>Cache Prompt</code> save the ongoing prompt from the chat (default: true)</summary> Saves the prompt while it is being created by the chat to avoid reprocessing the entire prompt every time</details>
594
-
-`Slot` slot of the server to use for computation. Value can be set from 0 to `Parallel Prompts`-1 (default: -1 = new slot for each character)
595
-
-`Seed` seed for reproducibility. For random results every time use -1
596
-
- <details><summary><code>Temperature</code> LLM temperature, lower values give more deterministic answers (default: 0.2)</summary>The temperature setting adjusts how random the generated responses are. Turning it up makes the generated choices more varied and unpredictable. Turning it down makes the generated responses more predictable and focused on the most likely options.</details>
597
-
- <details><summary><code>Top K</code> top-k sampling (default: 40, 0 = disabled)</summary>The top k value controls the top k most probable tokens at each step of generation. This value can help fine tune the output and make this adhere to specific patterns or constraints.</details>
598
-
- <details><summary><code>Top P</code> top-p sampling (default: 0.9, 1.0 = disabled)</summary>The top p value controls the cumulative probability of generated tokens. The model will generate tokens until this theshold (p) is reached. By lowering this value you can shorten output & encourage / discourage more diverse outputs.</details>
599
-
- <details><summary><code>Min P</code> minimum probability for a token to be used (default: 0.05)</summary> The probability is defined relative to the probability of the most likely token.</details>
600
-
- <details><summary><code>Repeat Penalty</code> control the repetition of token sequences in the generated text (default: 1.1)</summary>The penalty is applied to repeated tokens.</details>
601
-
- <details><summary><code>Presence Penalty</code> repeated token presence penalty (default: 0.0, 0.0 = disabled)</summary> Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.</details>
602
-
- <details><summary><code>Frequency Penalty</code> repeated token frequency penalty (default: 0.0, 0.0 = disabled)</summary> Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.</details>
603
-
-`Typical P`: enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).
604
-
-`Repeat Last N`: last N tokens to consider for penalizing repetition (default: 64, 0 = disabled, -1 = ctx-size).
605
-
-`Penalize Nl`: penalize newline tokens when applying the repeat penalty (default: true).
606
-
-`Penalty Prompt`: prompt for the purpose of the penalty evaluation. Can be either `null`, a string or an array of numbers representing tokens (default: `null` = use original `prompt`).
0 commit comments