Skip to content

Commit c8225b3

Browse files
xingyaowwmamoodi
andauthored
(llm): Support OpenHands LM (All-Hands-AI#7598)
Co-authored-by: mamoodi <[email protected]>
1 parent df497de commit c8225b3

File tree

5 files changed

+79
-157
lines changed

5 files changed

+79
-157
lines changed

docs/modules/usage/llms/llms.md

+1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ We have a few guides for running OpenHands with specific model providers:
5959
- [LiteLLM Proxy](llms/litellm-proxy)
6060
- [OpenAI](llms/openai-llms)
6161
- [OpenRouter](llms/openrouter)
62+
- [Local LLMs with SGLang or vLLM](llms/../local-llms.md)
6263

6364
### API retries and rate limits
6465

docs/modules/usage/llms/local-llms.md

+45-154
Original file line numberDiff line numberDiff line change
@@ -1,192 +1,83 @@
1-
# Local LLM with Ollama
1+
# Local LLM with SGLang or vLLM
22

33
:::warning
44
When using a Local LLM, OpenHands may have limited functionality.
5+
It is highly recommended that you use GPUs to serve local models for optimal experience.
56
:::
67

7-
Ensure that you have the Ollama server up and running.
8-
For detailed startup instructions, refer to [here](https://github.com/ollama/ollama).
8+
## News
99

10-
This guide assumes you've started ollama with `ollama serve`. If you're running ollama differently (e.g. inside docker), the instructions might need to be modified. Please note that if you're running WSL the default ollama configuration blocks requests from docker containers. See [here](#configuring-ollama-service-wsl-en).
10+
- 2025/03/31: We released an open model OpenHands LM v0.1 32B that achieves 37.1% on SWE-Bench Verified
11+
([blog](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model), [model](https://huggingface.co/all-hands/openhands-lm-32b-v0.1)).
1112

12-
## Pull Models
13+
## Download the Model from Huggingface
1314

14-
Ollama model names can be found [here](https://ollama.com/library). For a small example, you can use
15-
the `codellama:7b` model. Bigger models will generally perform better.
15+
For example, to download [OpenHands LM 32B v0.1](https://huggingface.co/all-hands/openhands-lm-32b-v0.1):
1616

1717
```bash
18-
ollama pull codellama:7b
18+
huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir my_folder/openhands-lm-32b-v0.1
1919
```
2020

21-
you can check which models you have downloaded like this:
21+
## Create an OpenAI-Compatible Endpoint With a Model Serving Framework
2222

23-
```bash
24-
~$ ollama list
25-
NAME ID SIZE MODIFIED
26-
codellama:7b 8fdf8f752f6e 3.8 GB 6 weeks ago
27-
mistral:7b-instruct-v0.2-q4_K_M eb14864c7427 4.4 GB 2 weeks ago
28-
starcoder2:latest f67ae0f64584 1.7 GB 19 hours ago
29-
```
30-
31-
## Run OpenHands with Docker
32-
33-
### Start OpenHands
34-
Use the instructions [here](../getting-started) to start OpenHands using Docker.
35-
But when running `docker run`, you'll need to add a few more arguments:
36-
37-
```bash
38-
docker run # ...
39-
--add-host host.docker.internal:host-gateway \
40-
-e LLM_OLLAMA_BASE_URL="http://host.docker.internal:11434" \
41-
# ...
42-
```
43-
44-
LLM_OLLAMA_BASE_URL is optional. If you set it, it will be used to show
45-
the available installed models in the UI.
46-
47-
48-
### Configure the Web Application
49-
50-
When running `openhands`, you'll need to set the following in the OpenHands UI through the Settings:
51-
- the model to "ollama/&lt;model-name&gt;"
52-
- the base url to `http://host.docker.internal:11434`
53-
- the API key is optional, you can use any string, such as `ollama`.
54-
55-
56-
## Run OpenHands in Development Mode
57-
58-
### Build from Source
59-
60-
Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands.
61-
Make sure `config.toml` is there by running `make setup-config` which will create one for you. In `config.toml`, enter the followings:
62-
63-
```
64-
[core]
65-
workspace_base="./workspace"
66-
67-
[llm]
68-
embedding_model="local"
69-
ollama_base_url="http://localhost:11434"
70-
71-
```
72-
73-
Done! Now you can start OpenHands by: `make run`. You now should be able to connect to `http://localhost:3000/`
74-
75-
### Configure the Web Application
76-
77-
In the OpenHands UI, click on the Settings wheel in the bottom-left corner.
78-
Then in the `Model` input, enter `ollama/codellama:7b`, or the name of the model you pulled earlier.
79-
If it doesn’t show up in the dropdown, enable `Advanced Settings` and type it in. Please note: you need the model name as listed by `ollama list`, with the prefix `ollama/`.
80-
81-
In the API Key field, enter `ollama` or any value, since you don't need a particular key.
82-
83-
In the Base URL field, enter `http://localhost:11434`.
84-
85-
And now you're ready to go!
86-
87-
## Configuring the ollama service (WSL) {#configuring-ollama-service-wsl-en}
23+
### Serving with SGLang
8824

89-
The default configuration for ollama in WSL only serves localhost. This means you can't reach it from a docker container. eg. it wont work with OpenHands. First let's test that ollama is running correctly.
25+
- Install SGLang following [the official documentation](https://docs.sglang.ai/start/install.html).
26+
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
9027

9128
```bash
92-
ollama list # get list of installed models
93-
curl http://localhost:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
94-
#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama:7b","prompt":"hi"}'
95-
#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama","prompt":"hi"}' #the tag is optional if there is only one
29+
SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
30+
--model my_folder/openhands-lm-32b-v0.1 \
31+
--served-model-name openhands-lm-32b-v0.1 \
32+
--port 8000 \
33+
--tp 2 --dp 1 \
34+
--host 0.0.0.0 \
35+
--api-key mykey --context-length 131072
9636
```
9737

98-
Once that is done, test that it allows "outside" requests, like those from inside a docker container.
38+
### Serving with vLLM
9939

100-
```bash
101-
docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container.
102-
docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
103-
#ex. docker exec cd9cc82f7a11 curl http://host.docker.internal:11434/api/generate -d '{"model":"codellama","prompt":"hi"}'
104-
```
105-
106-
## Fixing it
107-
108-
Now let's make it work. Edit /etc/systemd/system/ollama.service with sudo privileges. (Path may vary depending on linux flavor)
109-
110-
```bash
111-
sudo vi /etc/systemd/system/ollama.service
112-
```
113-
114-
or
115-
116-
```bash
117-
sudo nano /etc/systemd/system/ollama.service
118-
```
119-
120-
In the [Service] bracket add these lines
121-
122-
```
123-
Environment="OLLAMA_HOST=0.0.0.0:11434"
124-
Environment="OLLAMA_ORIGINS=*"
125-
```
126-
127-
Then save, reload the configuration and restart the service.
40+
- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html).
41+
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
12842

12943
```bash
130-
sudo systemctl daemon-reload
131-
sudo systemctl restart ollama
44+
vllm serve my_folder/openhands-lm-32b-v0.1 \
45+
--host 0.0.0.0 --port 8000 \
46+
--api-key mykey \
47+
--tensor-parallel-size 2 \
48+
--served-model-name openhands-lm-32b-v0.1
49+
--enable-prefix-caching
13250
```
13351

134-
Finally test that ollama is accessible from within the container
52+
## Run and Configure OpenHands
13553

136-
```bash
137-
ollama list # get list of installed models
138-
docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container.
139-
docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
140-
```
141-
142-
143-
# Local LLM with LM Studio
144-
145-
Steps to set up LM Studio:
146-
1. Open LM Studio
147-
2. Go to the Local Server tab.
148-
3. Click the "Start Server" button.
149-
4. Select the model you want to use from the dropdown.
150-
151-
152-
Set the following configs:
153-
```bash
154-
LLM_MODEL="openai/lmstudio"
155-
LLM_BASE_URL="http://localhost:1234/v1"
156-
CUSTOM_LLM_PROVIDER="openai"
157-
```
54+
### Run OpenHands
15855

159-
### Docker
56+
#### Using Docker
16057

161-
```bash
162-
docker run # ...
163-
-e LLM_MODEL="openai/lmstudio" \
164-
-e LLM_BASE_URL="http://host.docker.internal:1234/v1" \
165-
-e CUSTOM_LLM_PROVIDER="openai" \
166-
# ...
167-
```
58+
Run OpenHands using [the official docker run command](../installation#start-the-app).
16859

169-
You should now be able to connect to `http://localhost:3000/`
60+
#### Using Development Mode
17061

171-
In the development environment, you can set the following configs in the `config.toml` file:
62+
Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands.
63+
Ensure `config.toml` exists by running `make setup-config` which will create one for you. In the `config.toml`, enter the following:
17264

17365
```
17466
[core]
17567
workspace_base="./workspace"
17668
17769
[llm]
178-
model="openai/lmstudio"
179-
base_url="http://localhost:1234/v1"
180-
custom_llm_provider="openai"
70+
embedding_model="local"
71+
ollama_base_url="http://localhost:8000"
18172
```
18273

183-
Done! Now you can start OpenHands by: `make run` without Docker. You now should be able to connect to `http://localhost:3000/`
184-
185-
# Note
74+
Start OpenHands using `make run`.
18675

187-
For WSL, run the following commands in cmd to set up the networking mode to mirrored:
76+
### Configure OpenHands
18877

189-
```
190-
python -c "print('[wsl2]\nnetworkingMode=mirrored',file=open(r'%UserProfile%\.wslconfig','w'))"
191-
wsl --shutdown
192-
```
78+
Once OpenHands is running, you'll need to set the following in the OpenHands UI through the Settings:
79+
1. Enable `Advanced` options.
80+
2. Set the following:
81+
- `Custom Model` to `openai/<served-model-name>` (e.g. `openai/openhands-lm-32b-v0.1`)
82+
- `Base URL` to `http://host.docker.internal:8000`
83+
- `API key` to the same string you set when serving the model (e.g. `mykey`)

docs/sidebars.ts

+5
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,11 @@ const sidebars: SidebarsConfig = {
156156
label: 'OpenRouter',
157157
id: 'usage/llms/openrouter',
158158
},
159+
{
160+
type: 'doc',
161+
label: 'Local LLMs with SGLang or vLLM',
162+
id: 'usage/llms/local-llms',
163+
},
159164
],
160165
},
161166
],

evaluation/benchmarks/swe_bench/run_infer.py

+15
Original file line numberDiff line numberDiff line change
@@ -386,6 +386,21 @@ def complete_runtime(
386386
obs = runtime.run_action(action)
387387
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
388388

389+
if obs.exit_code == -1:
390+
# The previous command is still running
391+
# We need to kill previous command
392+
logger.info('The previous command is still running, trying to ctrl+z it...')
393+
action = CmdRunAction(command='C-z')
394+
obs = runtime.run_action(action)
395+
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
396+
397+
# Then run the command again
398+
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
399+
action.set_hard_timeout(600)
400+
logger.info(action, extra={'msg_type': 'ACTION'})
401+
obs = runtime.run_action(action)
402+
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
403+
389404
assert_and_raise(
390405
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
391406
f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',

openhands/llm/llm.py

+13-3
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,11 @@ def wrapper(*args, **kwargs):
210210
# if the agent or caller has defined tools, and we mock via prompting, convert the messages
211211
if mock_function_calling and 'tools' in kwargs:
212212
messages = convert_fncall_messages_to_non_fncall_messages(
213-
messages, kwargs['tools']
213+
messages,
214+
kwargs['tools'],
215+
add_in_context_learning_example=bool(
216+
'openhands-lm' not in self.config.model
217+
),
214218
)
215219
kwargs['messages'] = messages
216220

@@ -219,8 +223,14 @@ def wrapper(*args, **kwargs):
219223
kwargs['stop'] = STOP_WORDS
220224

221225
mock_fncall_tools = kwargs.pop('tools')
222-
# tool_choice should not be specified when mocking function calling
223-
kwargs.pop('tool_choice', None)
226+
if 'openhands-lm' in self.config.model:
227+
# If we don't have this, we might run into issue when serving openhands-lm
228+
# using SGLang
229+
# BadRequestError: litellm.BadRequestError: OpenAIException - Error code: 400 - {'object': 'error', 'message': '400', 'type': 'Failed to parse fc related info to json format!', 'param': None, 'code': 400}
230+
kwargs['tool_choice'] = 'none'
231+
else:
232+
# tool_choice should not be specified when mocking function calling
233+
kwargs.pop('tool_choice', None)
224234

225235
# if we have no messages, something went very wrong
226236
if not messages:

0 commit comments

Comments
 (0)