|
1 |
| -# Local LLM with Ollama |
| 1 | +# Local LLM with SGLang or vLLM |
2 | 2 |
|
3 | 3 | :::warning
|
4 | 4 | When using a Local LLM, OpenHands may have limited functionality.
|
| 5 | +It is highly recommended that you use GPUs to serve local models for optimal experience. |
5 | 6 | :::
|
6 | 7 |
|
7 |
| -Ensure that you have the Ollama server up and running. |
8 |
| -For detailed startup instructions, refer to [here](https://github.com/ollama/ollama). |
| 8 | +## News |
9 | 9 |
|
10 |
| -This guide assumes you've started ollama with `ollama serve`. If you're running ollama differently (e.g. inside docker), the instructions might need to be modified. Please note that if you're running WSL the default ollama configuration blocks requests from docker containers. See [here](#configuring-ollama-service-wsl-en). |
| 10 | +- 2025/03/31: We released an open model OpenHands LM v0.1 32B that achieves 37.1% on SWE-Bench Verified |
| 11 | +([blog](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model), [model](https://huggingface.co/all-hands/openhands-lm-32b-v0.1)). |
11 | 12 |
|
12 |
| -## Pull Models |
| 13 | +## Download the Model from Huggingface |
13 | 14 |
|
14 |
| -Ollama model names can be found [here](https://ollama.com/library). For a small example, you can use |
15 |
| -the `codellama:7b` model. Bigger models will generally perform better. |
| 15 | +For example, to download [OpenHands LM 32B v0.1](https://huggingface.co/all-hands/openhands-lm-32b-v0.1): |
16 | 16 |
|
17 | 17 | ```bash
|
18 |
| -ollama pull codellama:7b |
| 18 | +huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir my_folder/openhands-lm-32b-v0.1 |
19 | 19 | ```
|
20 | 20 |
|
21 |
| -you can check which models you have downloaded like this: |
| 21 | +## Create an OpenAI-Compatible Endpoint With a Model Serving Framework |
22 | 22 |
|
23 |
| -```bash |
24 |
| -~$ ollama list |
25 |
| -NAME ID SIZE MODIFIED |
26 |
| -codellama:7b 8fdf8f752f6e 3.8 GB 6 weeks ago |
27 |
| -mistral:7b-instruct-v0.2-q4_K_M eb14864c7427 4.4 GB 2 weeks ago |
28 |
| -starcoder2:latest f67ae0f64584 1.7 GB 19 hours ago |
29 |
| -``` |
30 |
| - |
31 |
| -## Run OpenHands with Docker |
32 |
| - |
33 |
| -### Start OpenHands |
34 |
| -Use the instructions [here](../getting-started) to start OpenHands using Docker. |
35 |
| -But when running `docker run`, you'll need to add a few more arguments: |
36 |
| - |
37 |
| -```bash |
38 |
| -docker run # ... |
39 |
| - --add-host host.docker.internal:host-gateway \ |
40 |
| - -e LLM_OLLAMA_BASE_URL="http://host.docker.internal:11434" \ |
41 |
| - # ... |
42 |
| -``` |
43 |
| - |
44 |
| -LLM_OLLAMA_BASE_URL is optional. If you set it, it will be used to show |
45 |
| -the available installed models in the UI. |
46 |
| - |
47 |
| - |
48 |
| -### Configure the Web Application |
49 |
| - |
50 |
| -When running `openhands`, you'll need to set the following in the OpenHands UI through the Settings: |
51 |
| -- the model to "ollama/<model-name>" |
52 |
| -- the base url to `http://host.docker.internal:11434` |
53 |
| -- the API key is optional, you can use any string, such as `ollama`. |
54 |
| - |
55 |
| - |
56 |
| -## Run OpenHands in Development Mode |
57 |
| - |
58 |
| -### Build from Source |
59 |
| - |
60 |
| -Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands. |
61 |
| -Make sure `config.toml` is there by running `make setup-config` which will create one for you. In `config.toml`, enter the followings: |
62 |
| - |
63 |
| -``` |
64 |
| -[core] |
65 |
| -workspace_base="./workspace" |
66 |
| -
|
67 |
| -[llm] |
68 |
| -embedding_model="local" |
69 |
| -ollama_base_url="http://localhost:11434" |
70 |
| -
|
71 |
| -``` |
72 |
| - |
73 |
| -Done! Now you can start OpenHands by: `make run`. You now should be able to connect to `http://localhost:3000/` |
74 |
| - |
75 |
| -### Configure the Web Application |
76 |
| - |
77 |
| -In the OpenHands UI, click on the Settings wheel in the bottom-left corner. |
78 |
| -Then in the `Model` input, enter `ollama/codellama:7b`, or the name of the model you pulled earlier. |
79 |
| -If it doesn’t show up in the dropdown, enable `Advanced Settings` and type it in. Please note: you need the model name as listed by `ollama list`, with the prefix `ollama/`. |
80 |
| - |
81 |
| -In the API Key field, enter `ollama` or any value, since you don't need a particular key. |
82 |
| - |
83 |
| -In the Base URL field, enter `http://localhost:11434`. |
84 |
| - |
85 |
| -And now you're ready to go! |
86 |
| - |
87 |
| -## Configuring the ollama service (WSL) {#configuring-ollama-service-wsl-en} |
| 23 | +### Serving with SGLang |
88 | 24 |
|
89 |
| -The default configuration for ollama in WSL only serves localhost. This means you can't reach it from a docker container. eg. it wont work with OpenHands. First let's test that ollama is running correctly. |
| 25 | +- Install SGLang following [the official documentation](https://docs.sglang.ai/start/install.html). |
| 26 | +- Example launch command for OpenHands LM 32B (with at least 2 GPUs): |
90 | 27 |
|
91 | 28 | ```bash
|
92 |
| -ollama list # get list of installed models |
93 |
| -curl http://localhost:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}' |
94 |
| -#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama:7b","prompt":"hi"}' |
95 |
| -#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama","prompt":"hi"}' #the tag is optional if there is only one |
| 29 | +SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ |
| 30 | + --model my_folder/openhands-lm-32b-v0.1 \ |
| 31 | + --served-model-name openhands-lm-32b-v0.1 \ |
| 32 | + --port 8000 \ |
| 33 | + --tp 2 --dp 1 \ |
| 34 | + --host 0.0.0.0 \ |
| 35 | + --api-key mykey --context-length 131072 |
96 | 36 | ```
|
97 | 37 |
|
98 |
| -Once that is done, test that it allows "outside" requests, like those from inside a docker container. |
| 38 | +### Serving with vLLM |
99 | 39 |
|
100 |
| -```bash |
101 |
| -docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container. |
102 |
| -docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}' |
103 |
| -#ex. docker exec cd9cc82f7a11 curl http://host.docker.internal:11434/api/generate -d '{"model":"codellama","prompt":"hi"}' |
104 |
| -``` |
105 |
| - |
106 |
| -## Fixing it |
107 |
| - |
108 |
| -Now let's make it work. Edit /etc/systemd/system/ollama.service with sudo privileges. (Path may vary depending on linux flavor) |
109 |
| - |
110 |
| -```bash |
111 |
| -sudo vi /etc/systemd/system/ollama.service |
112 |
| -``` |
113 |
| - |
114 |
| -or |
115 |
| - |
116 |
| -```bash |
117 |
| -sudo nano /etc/systemd/system/ollama.service |
118 |
| -``` |
119 |
| - |
120 |
| -In the [Service] bracket add these lines |
121 |
| - |
122 |
| -``` |
123 |
| -Environment="OLLAMA_HOST=0.0.0.0:11434" |
124 |
| -Environment="OLLAMA_ORIGINS=*" |
125 |
| -``` |
126 |
| - |
127 |
| -Then save, reload the configuration and restart the service. |
| 40 | +- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html). |
| 41 | +- Example launch command for OpenHands LM 32B (with at least 2 GPUs): |
128 | 42 |
|
129 | 43 | ```bash
|
130 |
| -sudo systemctl daemon-reload |
131 |
| -sudo systemctl restart ollama |
| 44 | +vllm serve my_folder/openhands-lm-32b-v0.1 \ |
| 45 | + --host 0.0.0.0 --port 8000 \ |
| 46 | + --api-key mykey \ |
| 47 | + --tensor-parallel-size 2 \ |
| 48 | + --served-model-name openhands-lm-32b-v0.1 |
| 49 | + --enable-prefix-caching |
132 | 50 | ```
|
133 | 51 |
|
134 |
| -Finally test that ollama is accessible from within the container |
| 52 | +## Run and Configure OpenHands |
135 | 53 |
|
136 |
| -```bash |
137 |
| -ollama list # get list of installed models |
138 |
| -docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container. |
139 |
| -docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}' |
140 |
| -``` |
141 |
| - |
142 |
| - |
143 |
| -# Local LLM with LM Studio |
144 |
| - |
145 |
| -Steps to set up LM Studio: |
146 |
| -1. Open LM Studio |
147 |
| -2. Go to the Local Server tab. |
148 |
| -3. Click the "Start Server" button. |
149 |
| -4. Select the model you want to use from the dropdown. |
150 |
| - |
151 |
| - |
152 |
| -Set the following configs: |
153 |
| -```bash |
154 |
| -LLM_MODEL="openai/lmstudio" |
155 |
| -LLM_BASE_URL="http://localhost:1234/v1" |
156 |
| -CUSTOM_LLM_PROVIDER="openai" |
157 |
| -``` |
| 54 | +### Run OpenHands |
158 | 55 |
|
159 |
| -### Docker |
| 56 | +#### Using Docker |
160 | 57 |
|
161 |
| -```bash |
162 |
| -docker run # ... |
163 |
| - -e LLM_MODEL="openai/lmstudio" \ |
164 |
| - -e LLM_BASE_URL="http://host.docker.internal:1234/v1" \ |
165 |
| - -e CUSTOM_LLM_PROVIDER="openai" \ |
166 |
| - # ... |
167 |
| -``` |
| 58 | +Run OpenHands using [the official docker run command](../installation#start-the-app). |
168 | 59 |
|
169 |
| -You should now be able to connect to `http://localhost:3000/` |
| 60 | +#### Using Development Mode |
170 | 61 |
|
171 |
| -In the development environment, you can set the following configs in the `config.toml` file: |
| 62 | +Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands. |
| 63 | +Ensure `config.toml` exists by running `make setup-config` which will create one for you. In the `config.toml`, enter the following: |
172 | 64 |
|
173 | 65 | ```
|
174 | 66 | [core]
|
175 | 67 | workspace_base="./workspace"
|
176 | 68 |
|
177 | 69 | [llm]
|
178 |
| -model="openai/lmstudio" |
179 |
| -base_url="http://localhost:1234/v1" |
180 |
| -custom_llm_provider="openai" |
| 70 | +embedding_model="local" |
| 71 | +ollama_base_url="http://localhost:8000" |
181 | 72 | ```
|
182 | 73 |
|
183 |
| -Done! Now you can start OpenHands by: `make run` without Docker. You now should be able to connect to `http://localhost:3000/` |
184 |
| - |
185 |
| -# Note |
| 74 | +Start OpenHands using `make run`. |
186 | 75 |
|
187 |
| -For WSL, run the following commands in cmd to set up the networking mode to mirrored: |
| 76 | +### Configure OpenHands |
188 | 77 |
|
189 |
| -``` |
190 |
| -python -c "print('[wsl2]\nnetworkingMode=mirrored',file=open(r'%UserProfile%\.wslconfig','w'))" |
191 |
| -wsl --shutdown |
192 |
| -``` |
| 78 | +Once OpenHands is running, you'll need to set the following in the OpenHands UI through the Settings: |
| 79 | +1. Enable `Advanced` options. |
| 80 | +2. Set the following: |
| 81 | +- `Custom Model` to `openai/<served-model-name>` (e.g. `openai/openhands-lm-32b-v0.1`) |
| 82 | +- `Base URL` to `http://host.docker.internal:8000` |
| 83 | +- `API key` to the same string you set when serving the model (e.g. `mykey`) |
0 commit comments