Releases: arthw/llama.cpp
Releases · arthw/llama.cpp
b4937
b4789
Merge pull request #8 from arthw/fix_q4_1 fix ut fault of Q4_1, Q5..
b4787
Merge pull request #7 from arthw/cherry_pick_20250224 Cherry pick 20250224
b4383
Merge pull request #6 from arthw/cherry-1220 Cherry 1220
b4137
Merge pull request #5 from arthw/cherry-1118 Cherry 1118
b3555
fix error
b3554
ggml-backend : fix async copy from CPU (#8897) * ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same
b3517
[SYCL] Fixing wrong VDR iq4nl value (#8812)
b3482
Merge pull request #2 from arthw/refactor_dev Refactor device management and usage api
b3475
llama : add support for llama 3.1 rope scaling factors (#8676) * Add llama 3.1 rope scaling factors to llama conversion and inference This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192 * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> * address comments * address comments * Update src/llama.cpp Co-authored-by: compilade <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> --------- Co-authored-by: compilade <[email protected]>