Environment Examples of LLM Agents, designed to be integrated with VeRL.
This project strictly follows the conventions of Gymnasium (previously OpenAI Gym) for creating and managing environments.
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use: venv\Scripts\activate
-
Install the package in editable mode with development dependencies:
pip install -e ".[dev]"
This will install:
- All required dependencies
- Development tools:
- pytest: For running tests
- pytest-cov: For test coverage reporting
- black: Code formatter
- isort: Import sorter
- mypy: Static type checker
pytest
# Run black formatter
black .
# Sort imports
isort .
mypy src
-
Initialize an Environment
import verl_agent_env.interface as interface env_id = interface.initialize_environment("ENV-NAME") observation, info = interface.reset(env_id)
-
Step Through the Environment
action = ... # Define your action here observation, reward, done, truncated, info = interface.step(env_id, action)
-
Close and Clean Up the Environment
interface.close_environment(env_id)
Compatability with LLM Chat Message List: To make the environment compatible with LLM Chat Message List, the observation
are designed to be a list of dictionaries (messages) with the following keys:
role
: The role of the message.content
: The content of the message.- (Optional)
tool_call_id
: The tool call id of the message. If therole
istool
, thetool_call_id
is the id of the tool call. If therole
isuser
, there is notool_call_id
.
The action
is also directly compatible with the messages
in the popular LLM API (e.g. OpenAI, Anthropic, etc.). More specifically, the action
is a list of dictionaries with the following keys:
role
: The role of the message, which is usuallyassistant
.content
: The content of the message.tool_calls
: The tool calls of the message. This is a list of dictionaries with the following keys:id
: The id of the tool call.type
: The type of the tool call, which is usuallyfunction
.function
: The function of the tool call. This is a dictionary with the following keys:name
: The name of the tool.arguments
: The arguments of the tool call.
-
Initialize an Environment
- Endpoint:
POST /api/environment/initialize
- Description: Initializes a new environment instance.
- Request Body: JSON object with
env_name
field. - Response: JSON object with a message and
env_id
.
- Endpoint:
-
Close and Clean Up the Environment
- Endpoint:
POST /api/environment/{env_id}/close
- Description: Closes and cleans up the environment instance.
- Path Parameter:
env_id
- The ID of the environment. - Response: JSON object indicating success or failure.
- Endpoint:
-
Retrieve Action Space JSON Schema
- Endpoint:
GET /api/environment/{env_id}/action-space
- Description: Retrieves the action space of the environment in a JSON schema format.
- Path Parameter:
env_id
- The ID of the environment. - Response: JSON object containing the action space schema or an error message if the environment is not found.
- Endpoint:
To start the FastAPI server, run the following command:
uvicorn src.verl_agent_env.app:app --reload
This will start the server on http://127.0.0.1:8000
, and you can access the API documentation at http://127.0.0.1:8000/docs
.
To serve the FastAPI application using Docker, follow these steps:
- Ensure Docker is installed and running on your machine.
- Navigate to the root directory of the project where the
Dockerfile
is located. - Build the Docker image using the following command:
docker build -t verl-agent-env .
- Run the Docker container using the following command:
docker run -p 8000:8000 verl-agent-env
This will start the FastAPI server inside a Docker container, and it will be accessible at http://localhost:8000
. You can access the API documentation at http://localhost:8000/docs
.
- Add serving code
- Add docker container building logic
- High Concurrency for supporting >= 10K batch size
- Multi-Node Hosting
- Implement sokoban
- Implement Countdown
- Implement Frozen Lake
- RL Example with VerL + sokoban
- Data Curation
- Verl Dataset
- Verl Rollout
- Verl PPO Training
- MCP Env
To run the example of using the VeRL Agent Environment with a countdown task, execute the following command in your terminal:
python examples/gpt_play_countdown.py
Ensure that you have set the OPENAI_KEY
environment variable or have the OPENAI_KEY
file in the same directory as the script for authentication. This example demonstrates initializing an environment, running an agent loop, and closing the environment.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.