Tinyllama q4 works good too. #246
cosimoiaia
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
First of all, thanks for the great work on putting this app together.
I tested it on my pocophone (og. 5years old model) with a personal 7B finetuned from mistral which worked on q8 but it is obviously too slow at ~25 seconds per token.
Then I tested tinyllama q4 which actually works pretty good at around 5-8 tokens/second, very usable.
Look forward to future improvements to the app! 👍
Beta Was this translation helpful? Give feedback.
All reactions