Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(llm): Support OpenHands LM #7598

Merged
merged 17 commits into from
Mar 31, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/modules/usage/llms/llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ We have a few guides for running OpenHands with specific model providers:
- [LiteLLM Proxy](llms/litellm-proxy)
- [OpenAI](llms/openai-llms)
- [OpenRouter](llms/openrouter)
- [Local LLMs with SGLang or vLLM](llms/../local-llms.md)

### API retries and rate limits

Expand Down
182 changes: 40 additions & 142 deletions docs/modules/usage/llms/local-llms.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,61 @@
# Local LLM with Ollama
# Local LLM with SGLang or vLLM

:::warning
When using a Local LLM, OpenHands may have limited functionality.
It is highly recommended that you use GPUs to serve local models for optimal experience.
:::

Ensure that you have the Ollama server up and running.
For detailed startup instructions, refer to [here](https://github.com/ollama/ollama).

This guide assumes you've started ollama with `ollama serve`. If you're running ollama differently (e.g. inside docker), the instructions might need to be modified. Please note that if you're running WSL the default ollama configuration blocks requests from docker containers. See [here](#configuring-ollama-service-wsl-en).
## News

## Pull Models
- 2025/03/31: We release an open model OpenHands LM v0.1 32B that achieves 37.1% on SWE-Bench Verified ([blog](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model), [model](https://huggingface.co/all-hands/openhands-lm-32b-v0.1)).

Ollama model names can be found [here](https://ollama.com/library). For a small example, you can use
the `codellama:7b` model. Bigger models will generally perform better.

```bash
ollama pull codellama:7b
```
## Download the model from Huggingface

you can check which models you have downloaded like this:
For example, if you are hoping to download [OpenHands LM 32B v0.1](https://huggingface.co/all-hands/openhands-lm-32b-v0.1):

```bash
~$ ollama list
NAME ID SIZE MODIFIED
codellama:7b 8fdf8f752f6e 3.8 GB 6 weeks ago
mistral:7b-instruct-v0.2-q4_K_M eb14864c7427 4.4 GB 2 weeks ago
starcoder2:latest f67ae0f64584 1.7 GB 19 hours ago
huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir my_folder/openhands-lm-32b-v0.1
```

## Run OpenHands with Docker
## Create an OpenAI-compatible endpoint with a model serving framework

### Start OpenHands
Use the instructions [here](../getting-started) to start OpenHands using Docker.
But when running `docker run`, you'll need to add a few more arguments:
### Serving with SGLang

- Install SGLang following the official documentation: https://docs.sglang.ai/start/install.html
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):

```bash
docker run # ...
--add-host host.docker.internal:host-gateway \
-e LLM_OLLAMA_BASE_URL="http://host.docker.internal:11434" \
# ...
SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
--model my_folder/openhands-lm-32b-v0.1 \
--served-model-name openhands-lm-32b-v0.1 \
--port 8000 \
--tp 2 --dp 1 \
--host 0.0.0.0 \
--api-key mykey --context-length 131072
```

LLM_OLLAMA_BASE_URL is optional. If you set it, it will be used to show
the available installed models in the UI.
### Serving with vLLM

- Install vLLM following the official documentation: https://docs.vllm.ai/en/latest/getting_started/installation.html
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):

### Configure the Web Application
```bash
vllm serve my_folder/openhands-lm-32b-v0.1 \
--host 0.0.0.0 --port 8000 \
--api-key mykey \
--tensor-parallel-size 2 \
--served-model-name openhands-lm-32b-v0.1
--enable-prefix-caching
```

### Configure OpenHands Application

When running `openhands`, you'll need to set the following in the OpenHands UI through the Settings:
- the model to "ollama/<model-name>"
- the base url to `http://host.docker.internal:11434`
- the API key is optional, you can use any string, such as `ollama`.
- the model to `openai/openhands-lm-32b-v0.1` (`openai/`, and then `served-model-name` you set above)
- the base url to `http://host.docker.internal:8000`
- the API key is optional, you can use any string, such as `mykey` you set above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the API Key need to be set to the exact string you set above? Or can be set to anything regardless of what you set it to above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no -- it have to be the exact string you set above. will modify this


## Run OpenHands in Development Mode
Expand All @@ -66,7 +71,7 @@ workspace_base="./workspace"

[llm]
embedding_model="local"
ollama_base_url="http://localhost:11434"
ollama_base_url="http://localhost:8000"

```

Expand All @@ -75,118 +80,11 @@ Done! Now you can start OpenHands by: `make run`. You now should be able to conn
### Configure the Web Application

In the OpenHands UI, click on the Settings wheel in the bottom-left corner.
Then in the `Model` input, enter `ollama/codellama:7b`, or the name of the model you pulled earlier.
If it doesn’t show up in the dropdown, enable `Advanced Settings` and type it in. Please note: you need the model name as listed by `ollama list`, with the prefix `ollama/`.
Then in the `Model` input, enter `openai/openhands-lm-32b-v0.1`, or the name of the model you pulled earlier.
If it doesn’t show up in the dropdown, enable `Advanced Settings` and type it in.

In the API Key field, enter `ollama` or any value, since you don't need a particular key.
In the API Key field, enter `my` or any value you setted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix this but to understand, this section is for running it in development mode and the one above that has the same information is for when you run OpenHands via the docker command?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

In the Base URL field, enter `http://localhost:11434`.
In the Base URL field, enter `http://host.docker.internal:8000`.

And now you're ready to go!

## Configuring the ollama service (WSL) {#configuring-ollama-service-wsl-en}

The default configuration for ollama in WSL only serves localhost. This means you can't reach it from a docker container. eg. it wont work with OpenHands. First let's test that ollama is running correctly.

```bash
ollama list # get list of installed models
curl http://localhost:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama:7b","prompt":"hi"}'
#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama","prompt":"hi"}' #the tag is optional if there is only one
```

Once that is done, test that it allows "outside" requests, like those from inside a docker container.

```bash
docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container.
docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
#ex. docker exec cd9cc82f7a11 curl http://host.docker.internal:11434/api/generate -d '{"model":"codellama","prompt":"hi"}'
```

## Fixing it

Now let's make it work. Edit /etc/systemd/system/ollama.service with sudo privileges. (Path may vary depending on linux flavor)

```bash
sudo vi /etc/systemd/system/ollama.service
```

or

```bash
sudo nano /etc/systemd/system/ollama.service
```

In the [Service] bracket add these lines

```
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_ORIGINS=*"
```

Then save, reload the configuration and restart the service.

```bash
sudo systemctl daemon-reload
sudo systemctl restart ollama
```

Finally test that ollama is accessible from within the container

```bash
ollama list # get list of installed models
docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container.
docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
```


# Local LLM with LM Studio

Steps to set up LM Studio:
1. Open LM Studio
2. Go to the Local Server tab.
3. Click the "Start Server" button.
4. Select the model you want to use from the dropdown.


Set the following configs:
```bash
LLM_MODEL="openai/lmstudio"
LLM_BASE_URL="http://localhost:1234/v1"
CUSTOM_LLM_PROVIDER="openai"
```

### Docker

```bash
docker run # ...
-e LLM_MODEL="openai/lmstudio" \
-e LLM_BASE_URL="http://host.docker.internal:1234/v1" \
-e CUSTOM_LLM_PROVIDER="openai" \
# ...
```

You should now be able to connect to `http://localhost:3000/`

In the development environment, you can set the following configs in the `config.toml` file:

```
[core]
workspace_base="./workspace"

[llm]
model="openai/lmstudio"
base_url="http://localhost:1234/v1"
custom_llm_provider="openai"
```

Done! Now you can start OpenHands by: `make run` without Docker. You now should be able to connect to `http://localhost:3000/`

# Note

For WSL, run the following commands in cmd to set up the networking mode to mirrored:

```
python -c "print('[wsl2]\nnetworkingMode=mirrored',file=open(r'%UserProfile%\.wslconfig','w'))"
wsl --shutdown
```
5 changes: 5 additions & 0 deletions docs/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,11 @@ const sidebars: SidebarsConfig = {
label: 'OpenRouter',
id: 'usage/llms/openrouter',
},
{
type: 'doc',
label: 'Local LLMs with SGLang or vLLM',
id: 'usage/llms/local-llms',
},
],
},
],
Expand Down
15 changes: 15 additions & 0 deletions evaluation/benchmarks/swe_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,21 @@ def complete_runtime(
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})

if obs.exit_code == -1:
# The previous command is still running
# We need to kill previous command
logger.info('The previous command is still running, trying to ctrl+z it...')
action = CmdRunAction(command='C-z')
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})

# Then run the command again
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})

assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
Expand Down
16 changes: 13 additions & 3 deletions openhands/llm/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,11 @@ def wrapper(*args, **kwargs):
# if the agent or caller has defined tools, and we mock via prompting, convert the messages
if mock_function_calling and 'tools' in kwargs:
messages = convert_fncall_messages_to_non_fncall_messages(
messages, kwargs['tools']
messages,
kwargs['tools'],
add_in_context_learning_example=bool(
'openhands-lm' not in self.config.model
),
)
kwargs['messages'] = messages

Expand All @@ -219,8 +223,14 @@ def wrapper(*args, **kwargs):
kwargs['stop'] = STOP_WORDS

mock_fncall_tools = kwargs.pop('tools')
# tool_choice should not be specified when mocking function calling
kwargs.pop('tool_choice', None)
if 'openhands-lm' in self.config.model:
# If we don't have this, we might run into issue when serving openhands-lm
# using SGLang
# BadRequestError: litellm.BadRequestError: OpenAIException - Error code: 400 - {'object': 'error', 'message': '400', 'type': 'Failed to parse fc related info to json format!', 'param': None, 'code': 400}
kwargs['tool_choice'] = 'none'
else:
# tool_choice should not be specified when mocking function calling
kwargs.pop('tool_choice', None)

# if we have no messages, something went very wrong
if not messages:
Expand Down
Loading