update readme

tybalex · tybalex · commit 010f4d282e86 · 2024-06-18T12:39:17.000-07:00
diff --git a/README.md b/README.md
@@ -23,11 +23,13 @@ For example:
 wget https://huggingface.co/sanjay920/Llama-3-8b-function-calling-alpha-v1.gguf/resolve/main/Llama-3-8b-function-calling-alpha-v1.gguf
 ```
 
-4. start server:
+4. start openai compatible server:
 ```
 ./llama-server -ngl 35 -m Llama-3-8b-function-calling-alpha-v1.gguf   --port 1234 --host 0.0.0.0  -c 16000 --chat-template llama3
 ```
 
+5. That's it! Make sure you turn `stream` off when making api calls to the server, as streaming feature is not supported yet.
+
 ### Recent API changes
 
 - [2024 Apr 21] `llama_token_to_piece` can now optionally render special tokens https://github.com/ggerganov/llama.cpp/pull/6807