Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(llm): Support OpenHands LM #7598

Merged
merged 17 commits into from
Mar 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/modules/usage/llms/llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ We have a few guides for running OpenHands with specific model providers:
- [LiteLLM Proxy](llms/litellm-proxy)
- [OpenAI](llms/openai-llms)
- [OpenRouter](llms/openrouter)
- [Local LLMs with SGLang or vLLM](llms/../local-llms.md)

### API retries and rate limits

Expand Down
199 changes: 45 additions & 154 deletions docs/modules/usage/llms/local-llms.md
Original file line number Diff line number Diff line change
@@ -1,192 +1,83 @@
# Local LLM with Ollama
# Local LLM with SGLang or vLLM

:::warning
When using a Local LLM, OpenHands may have limited functionality.
It is highly recommended that you use GPUs to serve local models for optimal experience.
:::

Ensure that you have the Ollama server up and running.
For detailed startup instructions, refer to [here](https://github.com/ollama/ollama).
## News

This guide assumes you've started ollama with `ollama serve`. If you're running ollama differently (e.g. inside docker), the instructions might need to be modified. Please note that if you're running WSL the default ollama configuration blocks requests from docker containers. See [here](#configuring-ollama-service-wsl-en).
- 2025/03/31: We released an open model OpenHands LM v0.1 32B that achieves 37.1% on SWE-Bench Verified
([blog](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model), [model](https://huggingface.co/all-hands/openhands-lm-32b-v0.1)).

## Pull Models
## Download the Model from Huggingface

Ollama model names can be found [here](https://ollama.com/library). For a small example, you can use
the `codellama:7b` model. Bigger models will generally perform better.
For example, to download [OpenHands LM 32B v0.1](https://huggingface.co/all-hands/openhands-lm-32b-v0.1):

```bash
ollama pull codellama:7b
huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir my_folder/openhands-lm-32b-v0.1
```

you can check which models you have downloaded like this:
## Create an OpenAI-Compatible Endpoint With a Model Serving Framework

```bash
~$ ollama list
NAME ID SIZE MODIFIED
codellama:7b 8fdf8f752f6e 3.8 GB 6 weeks ago
mistral:7b-instruct-v0.2-q4_K_M eb14864c7427 4.4 GB 2 weeks ago
starcoder2:latest f67ae0f64584 1.7 GB 19 hours ago
```

## Run OpenHands with Docker

### Start OpenHands
Use the instructions [here](../getting-started) to start OpenHands using Docker.
But when running `docker run`, you'll need to add a few more arguments:

```bash
docker run # ...
--add-host host.docker.internal:host-gateway \
-e LLM_OLLAMA_BASE_URL="http://host.docker.internal:11434" \
# ...
```

LLM_OLLAMA_BASE_URL is optional. If you set it, it will be used to show
the available installed models in the UI.


### Configure the Web Application

When running `openhands`, you'll need to set the following in the OpenHands UI through the Settings:
- the model to "ollama/<model-name>"
- the base url to `http://host.docker.internal:11434`
- the API key is optional, you can use any string, such as `ollama`.


## Run OpenHands in Development Mode

### Build from Source

Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands.
Make sure `config.toml` is there by running `make setup-config` which will create one for you. In `config.toml`, enter the followings:

```
[core]
workspace_base="./workspace"

[llm]
embedding_model="local"
ollama_base_url="http://localhost:11434"

```

Done! Now you can start OpenHands by: `make run`. You now should be able to connect to `http://localhost:3000/`

### Configure the Web Application

In the OpenHands UI, click on the Settings wheel in the bottom-left corner.
Then in the `Model` input, enter `ollama/codellama:7b`, or the name of the model you pulled earlier.
If it doesn’t show up in the dropdown, enable `Advanced Settings` and type it in. Please note: you need the model name as listed by `ollama list`, with the prefix `ollama/`.

In the API Key field, enter `ollama` or any value, since you don't need a particular key.

In the Base URL field, enter `http://localhost:11434`.

And now you're ready to go!

## Configuring the ollama service (WSL) {#configuring-ollama-service-wsl-en}
### Serving with SGLang

The default configuration for ollama in WSL only serves localhost. This means you can't reach it from a docker container. eg. it wont work with OpenHands. First let's test that ollama is running correctly.
- Install SGLang following [the official documentation](https://docs.sglang.ai/start/install.html).
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):

```bash
ollama list # get list of installed models
curl http://localhost:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama:7b","prompt":"hi"}'
#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama","prompt":"hi"}' #the tag is optional if there is only one
SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
--model my_folder/openhands-lm-32b-v0.1 \
--served-model-name openhands-lm-32b-v0.1 \
--port 8000 \
--tp 2 --dp 1 \
--host 0.0.0.0 \
--api-key mykey --context-length 131072
```

Once that is done, test that it allows "outside" requests, like those from inside a docker container.
### Serving with vLLM

```bash
docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container.
docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
#ex. docker exec cd9cc82f7a11 curl http://host.docker.internal:11434/api/generate -d '{"model":"codellama","prompt":"hi"}'
```

## Fixing it

Now let's make it work. Edit /etc/systemd/system/ollama.service with sudo privileges. (Path may vary depending on linux flavor)

```bash
sudo vi /etc/systemd/system/ollama.service
```

or

```bash
sudo nano /etc/systemd/system/ollama.service
```

In the [Service] bracket add these lines

```
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_ORIGINS=*"
```

Then save, reload the configuration and restart the service.
- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html).
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):

```bash
sudo systemctl daemon-reload
sudo systemctl restart ollama
vllm serve my_folder/openhands-lm-32b-v0.1 \
--host 0.0.0.0 --port 8000 \
--api-key mykey \
--tensor-parallel-size 2 \
--served-model-name openhands-lm-32b-v0.1
--enable-prefix-caching
```

Finally test that ollama is accessible from within the container
## Run and Configure OpenHands

```bash
ollama list # get list of installed models
docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container.
docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
```


# Local LLM with LM Studio

Steps to set up LM Studio:
1. Open LM Studio
2. Go to the Local Server tab.
3. Click the "Start Server" button.
4. Select the model you want to use from the dropdown.


Set the following configs:
```bash
LLM_MODEL="openai/lmstudio"
LLM_BASE_URL="http://localhost:1234/v1"
CUSTOM_LLM_PROVIDER="openai"
```
### Run OpenHands

### Docker
#### Using Docker

```bash
docker run # ...
-e LLM_MODEL="openai/lmstudio" \
-e LLM_BASE_URL="http://host.docker.internal:1234/v1" \
-e CUSTOM_LLM_PROVIDER="openai" \
# ...
```
Run OpenHands using [the official docker run command](../installation#start-the-app).

You should now be able to connect to `http://localhost:3000/`
#### Using Development Mode

In the development environment, you can set the following configs in the `config.toml` file:
Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands.
Ensure `config.toml` exists by running `make setup-config` which will create one for you. In the `config.toml`, enter the following:

```
[core]
workspace_base="./workspace"

[llm]
model="openai/lmstudio"
base_url="http://localhost:1234/v1"
custom_llm_provider="openai"
embedding_model="local"
ollama_base_url="http://localhost:8000"
```

Done! Now you can start OpenHands by: `make run` without Docker. You now should be able to connect to `http://localhost:3000/`

# Note
Start OpenHands using `make run`.

For WSL, run the following commands in cmd to set up the networking mode to mirrored:
### Configure OpenHands

```
python -c "print('[wsl2]\nnetworkingMode=mirrored',file=open(r'%UserProfile%\.wslconfig','w'))"
wsl --shutdown
```
Once OpenHands is running, you'll need to set the following in the OpenHands UI through the Settings:
1. Enable `Advanced` options.
2. Set the following:
- `Custom Model` to `openai/<served-model-name>` (e.g. `openai/openhands-lm-32b-v0.1`)
- `Base URL` to `http://host.docker.internal:8000`
- `API key` to the same string you set when serving the model (e.g. `mykey`)
5 changes: 5 additions & 0 deletions docs/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,11 @@ const sidebars: SidebarsConfig = {
label: 'OpenRouter',
id: 'usage/llms/openrouter',
},
{
type: 'doc',
label: 'Local LLMs with SGLang or vLLM',
id: 'usage/llms/local-llms',
},
],
},
],
Expand Down
15 changes: 15 additions & 0 deletions evaluation/benchmarks/swe_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,21 @@ def complete_runtime(
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})

if obs.exit_code == -1:
# The previous command is still running
# We need to kill previous command
logger.info('The previous command is still running, trying to ctrl+z it...')
action = CmdRunAction(command='C-z')
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})

# Then run the command again
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})

assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
Expand Down
16 changes: 13 additions & 3 deletions openhands/llm/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,11 @@ def wrapper(*args, **kwargs):
# if the agent or caller has defined tools, and we mock via prompting, convert the messages
if mock_function_calling and 'tools' in kwargs:
messages = convert_fncall_messages_to_non_fncall_messages(
messages, kwargs['tools']
messages,
kwargs['tools'],
add_in_context_learning_example=bool(
'openhands-lm' not in self.config.model
),
)
kwargs['messages'] = messages

Expand All @@ -219,8 +223,14 @@ def wrapper(*args, **kwargs):
kwargs['stop'] = STOP_WORDS

mock_fncall_tools = kwargs.pop('tools')
# tool_choice should not be specified when mocking function calling
kwargs.pop('tool_choice', None)
if 'openhands-lm' in self.config.model:
# If we don't have this, we might run into issue when serving openhands-lm
# using SGLang
# BadRequestError: litellm.BadRequestError: OpenAIException - Error code: 400 - {'object': 'error', 'message': '400', 'type': 'Failed to parse fc related info to json format!', 'param': None, 'code': 400}
kwargs['tool_choice'] = 'none'
else:
# tool_choice should not be specified when mocking function calling
kwargs.pop('tool_choice', None)

# if we have no messages, something went very wrong
if not messages:
Expand Down
Loading