Releases · arthw/llama.cpp

19 Mar 03:31

b634b4b

b4937 Latest

Latest

Merge pull request #10 from arthw/fix_yaml

fix format

Assets 5

27 Feb 03:26

github-actions

b4789

37a06a0

b4789

Merge pull request #8 from arthw/fix_q4_1

fix ut fault of Q4_1, Q5..

Assets 25

27 Feb 01:17

github-actions

b4787

c69f491

b4787

Merge pull request #7 from arthw/cherry_pick_20250224

Cherry pick 20250224

Assets 25

20 Dec 16:27

github-actions

b4383

258e80f

b4383

Merge pull request #6 from arthw/cherry-1220

Cherry 1220

Assets 23

19 Nov 02:48

github-actions

b4137

8dcc98f

b4137

Merge pull request #5 from arthw/cherry-1118

Cherry 1118

Assets 21

07 Aug 16:34

github-actions

b3555

75a3266

b3555

fix error

Assets 20

07 Aug 16:26

github-actions

b3554

9d73802

b3554

ggml-backend : fix async copy from CPU (#8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

Assets 19

02 Aug 05:48

github-actions

b3517

11c713b

b3517

[SYCL] Fixing wrong VDR iq4nl value (#8812)

Assets 20

01 Aug 06:57

github-actions

b3482

c16f01b

b3482

Merge pull request #2 from arthw/refactor_dev

Refactor device management and usage api

Assets 20

27 Jul 14:49

github-actions

b3475

e661170

b3475

llama : add support for llama 3.1 rope scaling factors (#8676)

* Add llama 3.1 rope scaling factors to llama conversion and inference

This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

* address comments

* address comments

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

---------

Co-authored-by: compilade <[email protected]>

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: arthw/llama.cpp

b4937

b4789

b4787

b4383

b4137

b3555

b3554

b3517

b3482

b3475