Skip to content

Commit a84d19f

Browse files
frankxu2004OpenDevinBot
andauthored
Enable CodeAct agents with browsing, and also enable arbitrary BrowserGym action support (All-Hands-AI#1807)
* enable browsing in codeact, and arbitrary browsergym DSL support * fix * fix unit test case * update frontend for the new interactive browsing action * bump ver * Fix integration tests --------- Co-authored-by: OpenDevinBot <[email protected]>
1 parent 76abca3 commit a84d19f

File tree

26 files changed

+293
-67
lines changed

26 files changed

+293
-67
lines changed

agenthub/codeact_agent/codeact_agent.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,13 @@
1313
from opendevin.events.action import (
1414
Action,
1515
AgentFinishAction,
16+
BrowseInteractiveAction,
1617
CmdRunAction,
1718
IPythonRunCellAction,
1819
MessageAction,
1920
)
2021
from opendevin.events.observation import (
22+
BrowserOutputObservation,
2123
CmdOutputObservation,
2224
IPythonRunCellObservation,
2325
)
@@ -33,7 +35,7 @@
3335

3436
def parse_response(response) -> str:
3537
action = response.choices[0].message.content
36-
for lang in ['bash', 'ipython']:
38+
for lang in ['bash', 'ipython', 'browse']:
3739
if f'<execute_{lang}>' in action and f'</execute_{lang}>' not in action:
3840
action += f'</execute_{lang}>'
3941
return action
@@ -85,7 +87,7 @@ def swe_agent_edit_hack(bash_command: str) -> str:
8587

8688

8789
class CodeActAgent(Agent):
88-
VERSION = '1.2'
90+
VERSION = '1.3'
8991
"""
9092
The Code Act Agent is a minimalist agent.
9193
The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
@@ -171,6 +173,7 @@ def step(self, state: State) -> Action:
171173
Returns:
172174
- CmdRunAction(command) - bash command to run
173175
- IPythonRunCellAction(code) - IPython code to run
176+
- BrowseInteractiveAction(browsergym_command) - BrowserGym commands to run
174177
- MessageAction(content) - Message action to run (e.g. ask for clarification)
175178
- AgentFinishAction() - end the interaction
176179
"""
@@ -205,6 +208,9 @@ def step(self, state: State) -> Action:
205208
content = '\n'.join(splitted)
206209
content = truncate_observation(content)
207210
self.messages.append({'role': 'user', 'content': content})
211+
elif isinstance(obs, BrowserOutputObservation):
212+
content = 'OBSERVATION:\n' + truncate_observation(obs.content)
213+
self.messages.append({'role': 'user', 'content': content})
208214

209215
latest_user_message = [m for m in self.messages if m['role'] == 'user'][-1]
210216
if latest_user_message:
@@ -217,6 +223,7 @@ def step(self, state: State) -> Action:
217223
stop=[
218224
'</execute_ipython>',
219225
'</execute_bash>',
226+
'</execute_browse>',
220227
],
221228
temperature=0.0,
222229
)
@@ -251,6 +258,15 @@ def step(self, state: State) -> Action:
251258
code_group = python_code.group(1).strip()
252259
thought = action_str.replace(python_code.group(0), '').strip()
253260
return IPythonRunCellAction(code=code_group, thought=thought)
261+
elif browse_command := re.search(
262+
r'<execute_browse>(.*)</execute_browse>', action_str, re.DOTALL
263+
):
264+
# BrowserGym actions was found
265+
browse_actions = browse_command.group(1).strip()
266+
thought = action_str.replace(browse_command.group(0), '').strip()
267+
return BrowseInteractiveAction(
268+
browser_actions=browse_actions, thought=thought
269+
)
254270
else:
255271
# We assume the LLM is GOOD enough that when it returns pure natural language
256272
# it want to talk to the user

agenthub/codeact_agent/prompt.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434
</execute_ipython>
3535
The assistant can execute bash commands on behalf of the user by wrapping them with <execute_bash> and </execute_bash>.
3636
For example, you can list the files in the current directory by <execute_bash> ls </execute_bash>.
37+
The assistant can browse the Internet with commands on behalf of the user by wrapping them with <execute_browse> and </execute_browse>.
38+
For example, you can browse a given URL by <execute_browse> goto("<URL>") </execute_browse>.
3739
The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block.
3840
The assistant can install Python packages through bash by <execute_bash> pip install [package needed] </execute_bash> and should always import packages and define variables before starting to use them.
3941
The assistant should stop <execute> and provide an answer when they have already obtained the answer from the execution result.
@@ -49,8 +51,8 @@
4951
If you require access to GitHub but $GITHUB_TOKEN is not set, ask the user to set it for you."""
5052

5153
SYSTEM_SUFFIX = """The assistant's response should be concise.
52-
You should include <execute_ipython> or <execute_bash> in every one of your responses, unless you are finished with the task or need more input or action from the user in order to proceed.
53-
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> instead of providing it.
54+
You should include <execute_ipython> or <execute_bash> or <execute_browse> in every one of your responses, unless you are finished with the task or need more input or action from the user in order to proceed.
55+
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
5456
"""
5557

5658
EXAMPLES = """
@@ -154,6 +156,21 @@ def index():
154156
ASSISTANT:
155157
The server is running on port 5000 with PID 124. You can access the list of numbers by visiting http://127.0.0.1:5000. If you have any further questions, feel free to ask!
156158
159+
USER: Now browse the newly started server's homepage and show me the content.
160+
161+
ASSISTANT:
162+
Sure! Let me browse the server's homepage at http://127.0.0.1:5000:
163+
<execute_browse>
164+
goto("http://127.0.0.1:5000")
165+
</execute_browse>
166+
167+
USER:
168+
Observation:
169+
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
170+
171+
ASSISTANT:
172+
The content of the server's homepage is "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask!
173+
157174
USER: Now kill the server, make it display the numbers in a table format.
158175
159176
ASSISTANT:
@@ -230,4 +247,5 @@ def index():
230247
"I don't understand your input. \n"
231248
'If you want to execute a bash command, please use <execute_bash> YOUR_COMMAND_HERE </execute_bash>.\n'
232249
'If you want to execute a block of Python code, please use <execute_ipython> YOUR_COMMAND_HERE </execute_ipython>.\n'
250+
'If you want to browse the Internet, please use <execute_browse> YOUR_COMMAND_HERE </execute_browse>.\n'
233251
)

frontend/src/services/actions.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,11 @@ const messageActions = {
1717
store.dispatch(setUrl(url));
1818
store.dispatch(setScreenshotSrc(screenshotSrc));
1919
},
20+
[ActionType.BROWSE_INTERACTIVE]: (message: ActionMessage) => {
21+
const { url, screenshotSrc } = message.args;
22+
store.dispatch(setUrl(url));
23+
store.dispatch(setScreenshotSrc(screenshotSrc));
24+
},
2025
[ActionType.WRITE]: (message: ActionMessage) => {
2126
const { path, content } = message.args;
2227
store.dispatch(updatePath(path));

frontend/src/types/ActionType.tsx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ enum ActionType {
2323
// Opens a web page.
2424
BROWSE = "browse",
2525

26+
// Interact with the browser instance.
27+
BROWSE_INTERACTIVE = "browse_interactive",
28+
2629
// Searches long-term memory.
2730
RECALL = "recall",
2831

opendevin/core/schema/action.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,10 @@ class ActionTypeSchema(BaseModel):
4040
"""Opens a web page.
4141
"""
4242

43+
BROWSE_INTERACTIVE: str = Field(default='browse_interactive')
44+
"""Interact with the browser instance.
45+
"""
46+
4347
RECALL: str = Field(default='recall')
4448
"""Searches long-term memory
4549
"""

opendevin/events/action/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
AgentSummarizeAction,
88
ChangeAgentStateAction,
99
)
10-
from .browse import BrowseURLAction
10+
from .browse import BrowseInteractiveAction, BrowseURLAction
1111
from .commands import CmdKillAction, CmdRunAction, IPythonRunCellAction
1212
from .empty import NullAction
1313
from .files import FileReadAction, FileWriteAction
@@ -20,6 +20,7 @@
2020
'CmdRunAction',
2121
'CmdKillAction',
2222
'BrowseURLAction',
23+
'BrowseInteractiveAction',
2324
'FileReadAction',
2425
'FileWriteAction',
2526
'AgentRecallAction',

opendevin/events/action/browse.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,15 @@ class BrowseURLAction(Action):
1616
@property
1717
def message(self) -> str:
1818
return f'Browsing URL: {self.url}'
19+
20+
21+
@dataclass
22+
class BrowseInteractiveAction(Action):
23+
browser_actions: str
24+
thought: str = ''
25+
action: str = ActionType.BROWSE_INTERACTIVE
26+
runnable: ClassVar[bool] = True
27+
28+
@property
29+
def message(self) -> str:
30+
return f'Executing browser actions: {self.browser_actions}'

opendevin/events/serialization/action.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
AgentRejectAction,
88
ChangeAgentStateAction,
99
)
10-
from opendevin.events.action.browse import BrowseURLAction
10+
from opendevin.events.action.browse import BrowseInteractiveAction, BrowseURLAction
1111
from opendevin.events.action.commands import (
1212
CmdKillAction,
1313
CmdRunAction,
@@ -22,6 +22,7 @@
2222
CmdRunAction,
2323
IPythonRunCellAction,
2424
BrowseURLAction,
25+
BrowseInteractiveAction,
2526
FileReadAction,
2627
FileWriteAction,
2728
AgentRecallAction,

opendevin/runtime/runtime.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from opendevin.events.action import (
66
Action,
77
AgentRecallAction,
8+
BrowseInteractiveAction,
89
BrowseURLAction,
910
CmdKillAction,
1011
CmdRunAction,
@@ -154,6 +155,10 @@ async def write(self, action: FileWriteAction) -> Observation:
154155
async def browse(self, action: BrowseURLAction) -> Observation:
155156
pass
156157

158+
@abstractmethod
159+
async def browse_interactive(self, action: BrowseInteractiveAction) -> Observation:
160+
pass
161+
157162
@abstractmethod
158163
async def recall(self, action: AgentRecallAction) -> Observation:
159164
pass

opendevin/runtime/server/browse.py

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,23 @@
11
import os
22

3+
from opendevin.core.schema import ActionType
34
from opendevin.events.observation import BrowserOutputObservation
45

56

67
async def browse(action, browser) -> BrowserOutputObservation: # type: ignore
7-
asked_url = action.url
8-
if not asked_url.startswith('http'):
9-
asked_url = os.path.abspath(os.curdir) + action.url
10-
try:
11-
# action in BrowserGym: see https://github.com/ServiceNow/BrowserGym/blob/main/core/src/browsergym/core/action/functions.py
8+
if action.action == ActionType.BROWSE:
9+
# legacy BrowseURLAction
10+
asked_url = action.url
11+
if not asked_url.startswith('http'):
12+
asked_url = os.path.abspath(os.curdir) + action.url
1213
action_str = f'goto("{asked_url}")'
14+
elif action.action == ActionType.BROWSE_INTERACTIVE:
15+
# new BrowseInteractiveAction, supports full featured BrowserGym actions
16+
# action in BrowserGym: see https://github.com/ServiceNow/BrowserGym/blob/main/core/src/browsergym/core/action/functions.py
17+
action_str = action.browser_actions
18+
else:
19+
raise ValueError(f'Invalid action type: {action.action}')
20+
try:
1321
# obs provided by BrowserGym: see https://github.com/ServiceNow/BrowserGym/blob/main/core/src/browsergym/core/env.py#L396
1422
obs = browser.step(action_str)
1523
return BrowserOutputObservation(
@@ -21,9 +29,12 @@ async def browse(action, browser) -> BrowserOutputObservation: # type: ignore
2129
last_browser_action=obs['last_action'], # last browser env action performed
2230
focused_element_bid=obs['focused_element_bid'], # focused element bid
2331
screenshot=obs['screenshot'], # base64-encoded screenshot, png
24-
url=asked_url,
32+
url=obs['url'], # URL of the page
2533
)
2634
except Exception as e:
2735
return BrowserOutputObservation(
28-
content=str(e), screenshot='', error=True, url=asked_url
36+
content=str(e),
37+
screenshot='',
38+
error=True,
39+
url=asked_url if action.action == ActionType.BROWSE else '',
2940
)

opendevin/runtime/server/runtime.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from opendevin.events.action import (
22
AgentRecallAction,
3+
BrowseInteractiveAction,
34
BrowseURLAction,
45
CmdKillAction,
56
CmdRunAction,
@@ -58,6 +59,9 @@ async def write(self, action: FileWriteAction) -> Observation:
5859
async def browse(self, action: BrowseURLAction) -> Observation:
5960
return await browse(action, self.browser)
6061

62+
async def browse_interactive(self, action: BrowseInteractiveAction) -> Observation:
63+
return await browse(action, self.browser)
64+
6165
async def recall(self, action: AgentRecallAction) -> Observation:
6266
return NullObservation('')
6367

tests/integration/mock/CodeActAgent/test_edits/prompt_001.log

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ print("Hello World!")
99
</execute_ipython>
1010
The assistant can execute bash commands on behalf of the user by wrapping them with <execute_bash> and </execute_bash>.
1111
For example, you can list the files in the current directory by <execute_bash> ls </execute_bash>.
12+
The assistant can browse the Internet with commands on behalf of the user by wrapping them with <execute_browse> and </execute_browse>.
13+
For example, you can browse a given URL by <execute_browse> goto("<URL>") </execute_browse>.
1214
The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block.
1315
The assistant can install Python packages through bash by <execute_bash> pip install [package needed] </execute_bash> and should always import packages and define variables before starting to use them.
1416
The assistant should stop <execute> and provide an answer when they have already obtained the answer from the execution result.
@@ -38,8 +40,8 @@ end_of_edit - replaces lines <start_line> through <end_line> (inclusive) with th
3840
Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION. If you'd like to add the line ' print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
3941

4042
The assistant's response should be concise.
41-
You should include <execute_ipython> or <execute_bash> in every one of your responses, unless you are finished with the task or need more input or action from the user in order to proceed.
42-
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> instead of providing it.
43+
You should include <execute_ipython> or <execute_bash> or <execute_browse> in every one of your responses, unless you are finished with the task or need more input or action from the user in order to proceed.
44+
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
4345

4446

4547
----------
@@ -146,6 +148,21 @@ Press CTRL+C to quit
146148
ASSISTANT:
147149
The server is running on port 5000 with PID 124. You can access the list of numbers by visiting http://127.0.0.1:5000. If you have any further questions, feel free to ask!
148150

151+
USER: Now browse the newly started server's homepage and show me the content.
152+
153+
ASSISTANT:
154+
Sure! Let me browse the server's homepage at http://127.0.0.1:5000:
155+
<execute_browse>
156+
goto("http://127.0.0.1:5000")
157+
</execute_browse>
158+
159+
USER:
160+
Observation:
161+
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
162+
163+
ASSISTANT:
164+
The content of the server's homepage is "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask!
165+
149166
USER: Now kill the server, make it display the numbers in a table format.
150167

151168
ASSISTANT:

tests/integration/mock/CodeActAgent/test_edits/prompt_002.log

Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ print("Hello World!")
99
</execute_ipython>
1010
The assistant can execute bash commands on behalf of the user by wrapping them with <execute_bash> and </execute_bash>.
1111
For example, you can list the files in the current directory by <execute_bash> ls </execute_bash>.
12+
The assistant can browse the Internet with commands on behalf of the user by wrapping them with <execute_browse> and </execute_browse>.
13+
For example, you can browse a given URL by <execute_browse> goto("<URL>") </execute_browse>.
1214
The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block.
1315
The assistant can install Python packages through bash by <execute_bash> pip install [package needed] </execute_bash> and should always import packages and define variables before starting to use them.
1416
The assistant should stop <execute> and provide an answer when they have already obtained the answer from the execution result.
@@ -38,8 +40,8 @@ end_of_edit - replaces lines <start_line> through <end_line> (inclusive) with th
3840
Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION. If you'd like to add the line ' print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
3941

4042
The assistant's response should be concise.
41-
You should include <execute_ipython> or <execute_bash> in every one of your responses, unless you are finished with the task or need more input or action from the user in order to proceed.
42-
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> instead of providing it.
43+
You should include <execute_ipython> or <execute_bash> or <execute_browse> in every one of your responses, unless you are finished with the task or need more input or action from the user in order to proceed.
44+
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
4345

4446

4547
----------
@@ -146,6 +148,21 @@ Press CTRL+C to quit
146148
ASSISTANT:
147149
The server is running on port 5000 with PID 124. You can access the list of numbers by visiting http://127.0.0.1:5000. If you have any further questions, feel free to ask!
148150

151+
USER: Now browse the newly started server's homepage and show me the content.
152+
153+
ASSISTANT:
154+
Sure! Let me browse the server's homepage at http://127.0.0.1:5000:
155+
<execute_browse>
156+
goto("http://127.0.0.1:5000")
157+
</execute_browse>
158+
159+
USER:
160+
Observation:
161+
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
162+
163+
ASSISTANT:
164+
The content of the server's homepage is "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask!
165+
149166
USER: Now kill the server, make it display the numbers in a table format.
150167

151168
ASSISTANT:
@@ -228,20 +245,19 @@ ENVIRONMENT REMINDER: You have 9 turns left to complete the task.
228245

229246
----------
230247

231-
Certainly! I will start by opening the file `bad.txt` to see the content and then proceed to fix the typos.
248+
Let's start by examining the contents of `bad.txt` to identify the typos. I'll read the file first.
232249

233250
<execute_bash>
234-
open bad.txt
251+
cat bad.txt
235252
</execute_bash>
236253

237254
----------
238255

239256
OBSERVATION:
240-
[File: /workspace/bad.txt (4 lines total)]
241-
1:This is a stupid typoo.
242-
2:Really?
243-
3:No mor typos!
244-
4:Enjoy!
257+
This is a stupid typoo.
258+
Really?
259+
No mor typos!
260+
Enjoy!
245261
[Command -1 finished with exit code 0]]
246262

247263
ENVIRONMENT REMINDER: You have 8 turns left to complete the task.

0 commit comments

Comments
 (0)