All-Hands-AI · xingyaoww · Mar 31, 2025 · Mar 20, 2025 · Mar 21, 2025 · Mar 26, 2025
@@ -59,6 +59,7 @@ We have a few guides for running OpenHands with specific model providers:
 - [LiteLLM Proxy](llms/litellm-proxy)
 - [OpenAI](llms/openai-llms)
 - [OpenRouter](llms/openrouter)
+- [Local LLMs with SGLang or vLLM](llms/../local-llms.md)
 
 ### API retries and rate limits
 

@@ -1,192 +1,83 @@
-# Local LLM with Ollama
+# Local LLM with SGLang or vLLM
 
 :::warning
 When using a Local LLM, OpenHands may have limited functionality.
+It is highly recommended that you use GPUs to serve local models for optimal experience.
 :::
 
-Ensure that you have the Ollama server up and running.
-For detailed startup instructions, refer to [here](https://github.com/ollama/ollama).
+## News
 
-This guide assumes you've started ollama with `ollama serve`. If you're running ollama differently (e.g. inside docker), the instructions might need to be modified. Please note that if you're running WSL the default ollama configuration blocks requests from docker containers. See [here](#configuring-ollama-service-wsl-en).
+- 2025/03/31: We released an open model OpenHands LM v0.1 32B that achieves 37.1% on SWE-Bench Verified
+([blog](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model), [model](https://huggingface.co/all-hands/openhands-lm-32b-v0.1)).
 
-## Pull Models
+## Download the Model from Huggingface
 
-Ollama model names can be found [here](https://ollama.com/library). For a small example, you can use
-the `codellama:7b` model. Bigger models will generally perform better.
+For example, to download [OpenHands LM 32B v0.1](https://huggingface.co/all-hands/openhands-lm-32b-v0.1):
 
 ```bash
-ollama pull codellama:7b
+huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir my_folder/openhands-lm-32b-v0.1
 ```
 
-you can check which models you have downloaded like this:
+## Create an OpenAI-Compatible Endpoint With a Model Serving Framework
 
-```bash
-~$ ollama list
-NAME                            ID              SIZE    MODIFIED
-codellama:7b                    8fdf8f752f6e    3.8 GB  6 weeks ago
-mistral:7b-instruct-v0.2-q4_K_M eb14864c7427    4.4 GB  2 weeks ago
-starcoder2:latest               f67ae0f64584    1.7 GB  19 hours ago
-```
-
-## Run OpenHands with Docker
-
-### Start OpenHands
-Use the instructions [here](../getting-started) to start OpenHands using Docker.
-But when running `docker run`, you'll need to add a few more arguments:
-
-```bash
-docker run # ...
-    --add-host host.docker.internal:host-gateway \
-    -e LLM_OLLAMA_BASE_URL="http://host.docker.internal:11434" \
-    # ...
-```
-
-LLM_OLLAMA_BASE_URL is optional. If you set it, it will be used to show
-the available installed models in the UI.
-
-
-### Configure the Web Application
-
-When running `openhands`, you'll need to set the following in the OpenHands UI through the Settings:
-- the model to "ollama/&lt;model-name&gt;"
-- the base url to `http://host.docker.internal:11434`
-- the API key is optional, you can use any string, such as `ollama`.
-
-
-## Run OpenHands in Development Mode
-
-### Build from Source
-
-Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands.
-Make sure `config.toml` is there by running `make setup-config` which will create one for you. In `config.toml`, enter the followings:
-
-```
-[core]
-workspace_base="./workspace"
-
-[llm]
-embedding_model="local"
-ollama_base_url="http://localhost:11434"
-
-```
-
-Done! Now you can start OpenHands by: `make run`. You now should be able to connect to `http://localhost:3000/`
-
-### Configure the Web Application
-
-In the OpenHands UI, click on the Settings wheel in the bottom-left corner.
-Then in the `Model` input, enter `ollama/codellama:7b`, or the name of the model you pulled earlier.
-If it doesn’t show up in the dropdown, enable `Advanced Settings` and type it in. Please note: you need the model name as listed by `ollama list`, with the prefix `ollama/`.
-
-In the API Key field, enter `ollama` or any value, since you don't need a particular key.
-
-In the Base URL field, enter `http://localhost:11434`.
-
-And now you're ready to go!
-
-## Configuring the ollama service (WSL) {#configuring-ollama-service-wsl-en}
+### Serving with SGLang
 
-The default configuration for ollama in WSL only serves localhost. This means you can't reach it from a docker container. eg. it wont work with OpenHands. First let's test that ollama is running correctly.
+- Install SGLang following [the official documentation](https://docs.sglang.ai/start/install.html).
+- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
 
 ```bash
-ollama list # get list of installed models
-curl http://localhost:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
-#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama:7b","prompt":"hi"}'
-#ex. curl http://localhost:11434/api/generate -d '{"model":"codellama","prompt":"hi"}' #the tag is optional if there is only one
+SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
+    --model my_folder/openhands-lm-32b-v0.1 \
+    --served-model-name openhands-lm-32b-v0.1 \
+    --port 8000 \
+    --tp 2 --dp 1 \
+    --host 0.0.0.0 \
+    --api-key mykey --context-length 131072
 ```
 
-Once that is done, test that it allows "outside" requests, like those from inside a docker container.
+### Serving with vLLM
 
-```bash
-docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container.
-docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
-#ex. docker exec cd9cc82f7a11 curl http://host.docker.internal:11434/api/generate -d '{"model":"codellama","prompt":"hi"}'
-```
-
-## Fixing it
-
-Now let's make it work. Edit /etc/systemd/system/ollama.service with sudo privileges. (Path may vary depending on linux flavor)
-
-```bash
-sudo vi /etc/systemd/system/ollama.service
-```
-
-or
-
-```bash
-sudo nano /etc/systemd/system/ollama.service
-```
-
-In the [Service] bracket add these lines
-
-```
-Environment="OLLAMA_HOST=0.0.0.0:11434"
-Environment="OLLAMA_ORIGINS=*"
-```
-
-Then save, reload the configuration and restart the service.
+- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html).
+- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
 
 ```bash
-sudo systemctl daemon-reload
-sudo systemctl restart ollama
+vllm serve my_folder/openhands-lm-32b-v0.1 \
+    --host 0.0.0.0 --port 8000 \
+    --api-key mykey \
+    --tensor-parallel-size 2 \
+    --served-model-name openhands-lm-32b-v0.1
+    --enable-prefix-caching
 ```
 
-Finally test that ollama is accessible from within the container
+## Run and Configure OpenHands
 
-```bash
-ollama list # get list of installed models
-docker ps # get list of running docker containers, for most accurate test choose the OpenHands sandbox container.
-docker exec [CONTAINER ID] curl http://host.docker.internal:11434/api/generate -d '{"model":"[NAME]","prompt":"hi"}'
-```
-
-
-# Local LLM with LM Studio
-
-Steps to set up LM Studio:
-1. Open LM Studio
-2. Go to the Local Server tab.
-3. Click the "Start Server" button.
-4. Select the model you want to use from the dropdown.
-
-
-Set the following configs:
-```bash
-LLM_MODEL="openai/lmstudio"
-LLM_BASE_URL="http://localhost:1234/v1"
-CUSTOM_LLM_PROVIDER="openai"
-```
+### Run OpenHands
 
-### Docker
+#### Using Docker
 
-```bash
-docker run # ...
-    -e LLM_MODEL="openai/lmstudio" \
-    -e LLM_BASE_URL="http://host.docker.internal:1234/v1" \
-    -e CUSTOM_LLM_PROVIDER="openai" \
-    # ...
-```
+Run OpenHands using [the official docker run command](../installation#start-the-app).
 
-You should now be able to connect to `http://localhost:3000/`
+#### Using Development Mode
 
-In the development environment, you can set the following configs in the `config.toml` file:
+Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands.
+Ensure `config.toml` exists by running `make setup-config` which will create one for you. In the `config.toml`, enter the following:
 
 ```
 [core]
 workspace_base="./workspace"
 
 [llm]
-model="openai/lmstudio"
-base_url="http://localhost:1234/v1"
-custom_llm_provider="openai"
+embedding_model="local"
+ollama_base_url="http://localhost:8000"
 ```
 
-Done! Now you can start OpenHands by: `make run` without Docker. You now should be able to connect to `http://localhost:3000/`
-
-# Note
+Start OpenHands using `make run`.
 
-For WSL, run the following commands in cmd to set up the networking mode to mirrored:
+### Configure OpenHands
 
-```
-python -c  "print('[wsl2]\nnetworkingMode=mirrored',file=open(r'%UserProfile%\.wslconfig','w'))"
-wsl --shutdown
-```
+Once OpenHands is running, you'll need to set the following in the OpenHands UI through the Settings:
+1. Enable `Advanced` options.
+2. Set the following:
+- `Custom Model` to `openai/<served-model-name>` (e.g. `openai/openhands-lm-32b-v0.1`)
+- `Base URL` to `http://host.docker.internal:8000`
+- `API key` to the same string you set when serving the model (e.g. `mykey`)
@@ -156,6 +156,11 @@ const sidebars: SidebarsConfig = {
                   label: 'OpenRouter',
                   id: 'usage/llms/openrouter',
                 },
+                {
+                  type: 'doc',
+                  label: 'Local LLMs with SGLang or vLLM',
+                  id: 'usage/llms/local-llms',
+                },
               ],
             },
           ],

@@ -386,6 +386,21 @@ def complete_runtime(
         obs = runtime.run_action(action)
         logger.info(obs, extra={'msg_type': 'OBSERVATION'})
 
+    if obs.exit_code == -1:
+        # The previous command is still running
+        # We need to kill previous command
+        logger.info('The previous command is still running, trying to ctrl+z it...')
+        action = CmdRunAction(command='C-z')
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
+        # Then run the command again
+        action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
+        action.set_hard_timeout(600)
+        logger.info(action, extra={'msg_type': 'ACTION'})
+        obs = runtime.run_action(action)
+        logger.info(obs, extra={'msg_type': 'OBSERVATION'})
+
     assert_and_raise(
         isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
         f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',

diff --git a/openhands/llm/llm.py b/openhands/llm/llm.py
@@ -210,7 +210,11 @@ def wrapper(*args, **kwargs):
             # if the agent or caller has defined tools, and we mock via prompting, convert the messages
             if mock_function_calling and 'tools' in kwargs:
                 messages = convert_fncall_messages_to_non_fncall_messages(
-                    messages, kwargs['tools']
+                    messages,
+                    kwargs['tools'],
+                    add_in_context_learning_example=bool(
+                        'openhands-lm' not in self.config.model
+                    ),
                 )
                 kwargs['messages'] = messages
 
@@ -219,8 +223,14 @@ def wrapper(*args, **kwargs):
                     kwargs['stop'] = STOP_WORDS
 
                 mock_fncall_tools = kwargs.pop('tools')
-                # tool_choice should not be specified when mocking function calling
-                kwargs.pop('tool_choice', None)
+                if 'openhands-lm' in self.config.model:
+                    # If we don't have this, we might run into issue when serving openhands-lm
+                    # using SGLang
+                    # BadRequestError: litellm.BadRequestError: OpenAIException - Error code: 400 - {'object': 'error', 'message': '400', 'type': 'Failed to parse fc related info to json format!', 'param': None, 'code': 400}
+                    kwargs['tool_choice'] = 'none'
+                else:
+                    # tool_choice should not be specified when mocking function calling
+                    kwargs.pop('tool_choice', None)
 
             # if we have no messages, something went very wrong
             if not messages: