[pull] master from ggerganov:master #175

pull · 2025-02-10T10:12:04Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

typo: `\` -> `/` Change the UNIX path separator to` \`.

Technically the fixed width types come only from iostream and cstdint/stdint.h headers. memory and vector headers should not provide these. In GCC 15 the headers are cleaned up and you require the proper header cstdint. src/llama-mmap.h:26:5: error: ‘uint32_t’ does not name a type 26 | uint32_t read_u32() const; | ^~~~~~~~

* server : (webui) introduce conversation branching + idb storage * mark old conv as "migrated" instead deleting them * improve migration * add more comments * more clarification

…ke systems (#11770)

* Update ggml.c * Update arg.cpp * Update speculative.h

* CUDA: use arch list for feature availability check --------- Co-authored-by: Diego Devesa <[email protected]>

* server : use common_token_to_piece instead of common_detokenize This commit replaces the call to common_detokenize with common_token_to_piece in the populate_token_probs. The motivation for this change is to avoid an issue where common_detokenize would remove the word boundary character for tokens, which caused a regression in the server generated token probabilities. Resolves: #11728 * squash! server : use common_token_to_piece instead of common_detokenize Use common_token_to_piece for post_sampling_probs as well.

…#11803) * Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx * Fix #11802: PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string

Signed-off-by: Weizhao Ouyang <[email protected]>

* Bug fix for clamp_f32 When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0. * Bug fix for clamp_f32 * Bug fix for clamp_f32

* All messages get the copy button * Update index.html.gz

* ggml : x2 speed for WASM by optimizing SIMD * fix bad merging * rm trailing spaces * rm redundant clamp * better quantize_row_q8_K Co-authored-by: camel-cdr <[email protected]> * remove memset that causes buffer overflow Co-authored-by: camel-cdr <[email protected]> --------- Co-authored-by: camel-cdr <[email protected]>

* readme : add notice about new package registry * cont : fix whitespace

* simple typo fixed * Update examples/imatrix/README.md --------- Co-authored-by: Tobias Bergmann <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

…Intel Macs. (#11904)

* docker : attempt fixing arm64 build on ci * qemu v7.0.0-28

This patch fixes a typo in command help. prefx -> prefix Signed-off-by: Masanari Iida <[email protected]>

* vulkan: support memset_tensor * vulkan: support GGML_OP_SUM * vulkan: implement GGML_OP_ARGMAX * vulkan: implement GGML_OP_SUB * vulkan: implement GGML_OP_COUNT_EQUAL * vulkan: implement GGML_OP_OPT_STEP_ADAMW * vulkan: fix check_results RWKV_WKV6 crash and memory leaks * vulkan: implement GGML_OP_REPEAT_BACK * tests: remove invalid test-backend-ops REPEAT_BACK tests * vulkan: fix COUNT_EQUAL memset using a fillBuffer command

* CUDA: use async data loading for FlashAttention --------- Co-authored-by: Diego Devesa <[email protected]>

This commit fixes an issue in the llama.cpp project where the command for testing the llama-server object contained a duplicated file extension. The original command was: ./tests.sh unit/test_chat_completion.py.py -v -x It has been corrected to: ./tests.sh unit/test_chat_completion.py -v -x This change ensures that the test script correctly locates and executes the intended test file, preventing test failures due to an incorrect file name.

Signed-off-by: MoonRide303 <[email protected]>

* server : add TEI API format for /rerank endpoint * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * fix * also gitignore examples/server/*.gz.hpp --------- Co-authored-by: Georgi Gerganov <[email protected]>

…1900) * tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type * addressed clang-tidy lints in [test-]chat.* * rm minja deps from util & common & move it to common/minja/ * add name & tool_call_id to common_chat_msg * add common_chat_tool * added json <-> tools, msgs conversions to chat.h * fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens) * fix deepseek r1 slow test (no longer <think> opening w/ new template) * allow empty tools w/ auto + grammar * fix & test server grammar & json_schema params w/ & w/o --jinja

…n iframe) (#11940) * Webui: Enable communication with parent html (if webui is in iframe): - Listens for "setText" command from parent with "text" and "context" fields. "text" is set in inputMsg, "context" is used as hidden context on the following requests to the llama.cpp server - On pressing na Escape button sends command "escapePressed" to the parent Example handling from the parent html side: - Send command "setText" from parent html to webui in iframe: const iframe = document.getElementById('askAiIframe'); if (iframe) { iframe.contentWindow.postMessage({ command: 'setText', text: text, context: context }, '*'); } - Listen for Escape key from webui on parent html: // Listen for escape key event in the iframe window.addEventListener('keydown', (event) => { if (event.key === 'Escape') { // Process case when Escape is pressed inside webui } }); * Move the extraContext from storage to app.context. * Fix formatting. * add Message.extra * format + build * MessageExtraContext * build * fix display * rm console.log --------- Co-authored-by: igardev <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]>

This commit adjusts the indentation for the functions `parse_sequence` and `parse_rule` in src/llama-grammar.cpp. The motivation is consistency and improve readability.

* speculative : update default params * speculative : do not discard the last drafted token

This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder models. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with a command like the following: ```console $ llama.vim --fim-qwen-1.5b-default ``` Refs: #10932

Relates to: #11178 Added --chat-template-file CLI option to llama-run. If specified, the file will be read and the content passed for overwriting the chat template of the model to common_chat_templates_from_model. Signed-off-by: Michael Engel <[email protected]>

* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file * Improved Formating of code in ggml-cpu-quants.c file * style : minor fixes * style : less whitespaces * style : ptr spaceing --------- Co-authored-by: vithulep <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* ggml-cpu: Add CPU backend support for KleidiAI library * Add environmental variable GGML_KLEIDIAI_SME * Add support for multithread LHS conversion * Switch kernel selection order to dotprod and i8mm * updates for review comments * More updates for review comments * Reorganize and rename KleidiAI files * Move ggml-cpu-traits.h to source file * Update cmake for SME build and add alignment for SME * Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list

* fix skip ime composing * fix npm rebuild * fix warn --------- Co-authored-by: momonga <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]>

pascal-lc and others added 2 commits February 10, 2025 09:05

Update README.md [no ci] (#11781)

9ac3457

typo: `\` -> `/` Change the UNIX path separator to` \`.

sync: minja (google/minja@a72057e) (#11774)

d7b31a9

pull bot added the ⤵️ pull label Feb 10, 2025

github-actions bot added the examples label Feb 10, 2025

server : correct signal handler (#11795)

0893e01

github-actions bot added the server label Feb 10, 2025

wgottwalt and others added 4 commits February 10, 2025 20:58

server : (webui) introduce conversation branching + idb storage (#11792)

507f917

* server : (webui) introduce conversation branching + idb storage * mark old conv as "migrated" instead deleting them * improve migration * add more comments * more clarification

docs: utilize the forward slash (/) as the path separator for Unix-li…

8173261

…ke systems (#11770)

fix: typos in documentation files (#11791)

7b891bd

* Update ggml.c * Update arg.cpp * Update speculative.h

github-actions bot added the ggml label Feb 10, 2025

CUDA: use arch list for compatibility check (#11775)

b9ab0a4

* CUDA: use arch list for feature availability check --------- Co-authored-by: Diego Devesa <[email protected]>

github-actions bot added the Nvidia GPU label Feb 10, 2025

danbev and others added 3 commits February 11, 2025 14:06

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (…

90e4dba

…#11803) * Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx * Fix #11802: PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string

docs: add OpenCL (#11697)

4078c77

github-actions bot added the documentation Improvements or additions to documentation label Feb 11, 2025

danbev and others added 5 commits February 12, 2025 09:40

llama : fix typo in llama-grammar.h [no ci] (#11816)

369be55

CUDA: fix CUDART_VERSION checks (#11821)

c3d6af7

ggml-cpu: Fix duplicate MATMUL_INT8 (#11817)

198b1ec

Signed-off-by: Weizhao Ouyang <[email protected]>

ggml : fix multi-threaded clamp_f32 (#11824)

748ee9f

* Bug fix for clamp_f32 When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0. * Bug fix for clamp_f32 * Bug fix for clamp_f32

cleanup: fix compile warnings associated with gnu_printf (#11811)

fef0cbe

github-actions bot added the testing label Feb 12, 2025

IMbackK and others added 2 commits February 12, 2025 17:25

HIP: Switch to std::vector in rocblas version check (#11820)

e598697

sync : ggml

0fb77f8

github-actions bot added the script label Feb 12, 2025

MrSMlT and others added 4 commits February 12, 2025 21:36

Fix: Compile failure due to Microsoft STL breaking change (#11836)

bfd11a2

HIP: Remove GCN from list of devices that avoid MMQ (#11831)

5c4284d

server : (webui) Give copy button back to all message bubbles (#11814)

31afcbe

* All messages get the copy button * Update index.html.gz

github-actions bot added android SYCL Apple Metal labels Feb 15, 2025

ggerganov and others added 26 commits February 15, 2025 20:29

readme : add notice about new package registry (#11890)

c2cd24f

* readme : add notice about new package registry * cont : fix whitespace

metal : optimize dequant q6_K kernel (#11892)

2288510

examples: fix typo in imatrix/README.md (#11884)

fc10c38

* simple typo fixed * Update examples/imatrix/README.md --------- Co-authored-by: Tobias Bergmann <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

scripts: fix compare-llama-bench commit hash logic (#11891)

6dde178

metal : fix the crash caused by the lack of residency set support on …

c2ea16f

…Intel Macs. (#11904)

vulkan: support multi/vision rope, and noncontiguous rope (#11902)

bf42a23

ci : fix (again) arm64 build fails (#11895)

818a340

* docker : attempt fixing arm64 build on ci * qemu v7.0.0-28

common : Fix a typo in help (#11899)

fe163d5

This patch fixes a typo in command help. prefx -> prefix Signed-off-by: Masanari Iida <[email protected]>

server : bump httplib to 0.19.0 (#11908)

0f2bbe6

server : fix divide-by-zero in metrics reporting (#11915)

c4d29ba

update release requirements (#11897)

f7b1116

CUDA: use async data loading for FlashAttention (#11894)

73e2ed3

* CUDA: use async data loading for FlashAttention --------- Co-authored-by: Diego Devesa <[email protected]>

scripts: corrected encoding when getting chat template (#11866) (#11907)

5137da7

Signed-off-by: MoonRide303 <[email protected]>

llama : fix indentation in llama-grammar [no ci] (#11943)

9626d93

This commit adjusts the indentation for the functions `parse_sequence` and `parse_rule` in src/llama-grammar.cpp. The motivation is consistency and improve readability.

speculative : update default params (#11954)

abd4d0b

* speculative : update default params * speculative : do not discard the last drafted token

doc: add links to ggml examples [no ci] (#11958)

d04e716

server (webui): Fix Premature Submission During IME Conversion (#11971)

c392e50

* fix skip ime composing * fix npm rebuild * fix warn --------- Co-authored-by: momonga <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]>

teleprint-me closed this Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from ggerganov:master #175

[pull] master from ggerganov:master #175

Uh oh!

pull bot commented Feb 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

[pull] master from ggerganov:master #175

[pull] master from ggerganov:master #175

Uh oh!

Conversation

pull bot commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pull bot commented Feb 10, 2025 •

edited

Loading