@@ -26,10 +26,73 @@ wget https://huggingface.co/sanjay920/Llama-3-8b-function-calling-alpha-v1.gguf/
26
26
27
27
4 . start openai compatible server:
28
28
```
29
- ./llama-server -ngl 35 -m Llama-3-8b-function-calling-alpha-v1.gguf --port 1234 --host 0.0.0.0 -c 16000 --chat-template llama3
29
+ ./llama-server -ngl 37 -m Llama-3-8b-function-calling-alpha-v1.gguf --port 1234 --host 0.0.0.0 -c 8000 --chat-template llama3
30
30
```
31
31
32
- 5 . That's it! MAKE SURE you turn ` stream ` OFF when making api calls to the server, as the streaming feature is not supported yet. And we will support streaming too soon.
32
+ 5 . Test to make sure the server is available:
33
+ ``` bash
34
+ curl localhost:1234/v1/chat/completions \
35
+ -H " Content-Type: application/json" \
36
+ -H " Authorization: Bearer tokenabc-123" \
37
+ -d ' {
38
+ "model": "rubra-model",
39
+ "messages": [
40
+ {
41
+ "role": "system",
42
+ "content": "You are a helpful assistant."
43
+ },
44
+ {
45
+ "role": "user",
46
+ "content": "hello"
47
+ }
48
+ ]
49
+ }'
50
+ ```
51
+
52
+ 6 . Try a python function calling example:
53
+ ``` python
54
+ from openai import OpenAI
55
+ client = OpenAI(api_key = " 123" , base_url = " http://localhost:1234/v1/" )
56
+
57
+ tools = [
58
+ {
59
+ " type" : " function" ,
60
+ " function" : {
61
+ " name" : " get_current_weather" ,
62
+ " description" : " Get the current weather in a given location" ,
63
+ " parameters" : {
64
+ " type" : " object" ,
65
+ " properties" : {
66
+ " location" : {
67
+ " type" : " string" ,
68
+ " description" : " The city and state, e.g. San Francisco, CA" ,
69
+ },
70
+ " unit" : {" type" : " string" , " enum" : [" celsius" , " fahrenheit" ]},
71
+ },
72
+ " required" : [" location" ],
73
+ },
74
+ }
75
+ }
76
+ ]
77
+ messages = [{" role" : " user" , " content" : " What's the weather like in Boston today?" }]
78
+ completion = client.chat.completions.create(
79
+ model = " rubra-model" ,
80
+ messages = messages,
81
+ tools = tools,
82
+ tool_choice = " auto"
83
+ )
84
+
85
+ print (completion)
86
+ ```
87
+
88
+ The output should look like this:
89
+ ```
90
+ ChatCompletion(id='chatcmpl-EmHd8kai4DVwBUOyim054GmfcyUbjiLf', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='e885974b', function=Function(arguments='{"location":"Boston"}', name='get_current_weather'), type='function')]))], created=1719528056, model='rubra-model', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=29, prompt_tokens=241, total_tokens=270))
91
+ ```
92
+
93
+ That's it! MAKE SURE you turn ` stream ` OFF when making api calls to the server, as the streaming feature is not supported yet. And we will support streaming too soon.
94
+
95
+ For more function calling examples, you can checkout ` test_llamacpp.ipynb ` notebook.
33
96
34
97
### Recent API changes
35
98
0 commit comments