Skip to content

CodeActAgent: Only delegate to BrowsingAgent as last resort #2326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
38 changes: 33 additions & 5 deletions agenthub/codeact_agent/codeact_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@

def parse_response(response) -> str:
action = response.choices[0].message.content
for lang in ['bash', 'ipython', 'browse']:
for lang in ['bash', 'ipython', 'browse', 'delegate']:
if f'<execute_{lang}>' in action and f'</execute_{lang}>' not in action:
action += f'</execute_{lang}>'
return action
Expand All @@ -50,6 +50,10 @@ def action_to_str(action: Action) -> str:
return f'{action.thought}\n<execute_ipython>\n{action.code}\n</execute_ipython>'
elif isinstance(action, BrowseInteractiveAction):
return f'{action.thought}\n<execute_browse>\n{action.browser_actions}\n</execute_browse>'
elif isinstance(action, AgentDelegateAction):
return (
f'{action.thought}\n<execute_delegate>\n{action.agent}\n</execute_delegate>'
)
elif isinstance(action, MessageAction):
return action.content
return ''
Expand All @@ -61,6 +65,7 @@ def get_action_message(action: Action) -> dict[str, str] | None:
or isinstance(action, CmdRunAction)
or isinstance(action, IPythonRunCellAction)
or isinstance(action, MessageAction)
or isinstance(action, AgentDelegateAction)
):
return {
'role': 'user' if action.source == 'user' else 'assistant',
Expand Down Expand Up @@ -205,6 +210,7 @@ def step(self, state: State) -> Action:
- CmdRunAction(command) - bash command to run
- IPythonRunCellAction(code) - IPython code to run
- AgentDelegateAction(agent, inputs) - delegate action for (sub)task
- BrowseInteractiveAction(browsergym_command) - BrowserGym commands to run
- MessageAction(content) - Message action to run (e.g. ask for clarification)
- AgentFinishAction() - end the interaction
"""
Expand Down Expand Up @@ -236,6 +242,7 @@ def step(self, state: State) -> Action:
'</execute_ipython>',
'</execute_bash>',
'</execute_browse>',
'</execute_delegate>',
],
temperature=0.0,
)
Expand All @@ -248,7 +255,7 @@ def step(self, state: State) -> Action:
if finish_command := re.search(r'<finish>.*</finish>', action_str, re.DOTALL):
thought = action_str.replace(finish_command.group(0), '').strip()
return AgentFinishAction(thought=thought)
if bash_command := re.search(
elif bash_command := re.search(
r'<execute_bash>(.*?)</execute_bash>', action_str, re.DOTALL
):
# remove the command from the action string to get thought
Expand All @@ -273,10 +280,31 @@ def step(self, state: State) -> Action:
elif browse_command := re.search(
r'<execute_browse>(.*)</execute_browse>', action_str, re.DOTALL
):
thought = action_str.replace(browse_command.group(0), '').strip()
# BrowserGym actions was found
browse_actions = browse_command.group(1).strip()
task = f'{thought}. I should start with: {browse_actions}'
return AgentDelegateAction(agent='BrowsingAgent', inputs={'task': task})
thought = action_str.replace(browse_command.group(0), '').strip()
return BrowseInteractiveAction(
browser_actions=browse_actions, thought=thought
)
elif delegate_command := re.search(
r'<execute_delegate>(.*)</execute_delegate>', action_str, re.DOTALL
):
# Delegate action was found
thought = action_str.replace(delegate_command.group(0), '').strip()
delegate_action = delegate_command.group(1).strip()
if '(' in delegate_action and ')' in delegate_action:
agent_match = re.search(r'(\w+)\(', delegate_action)
if agent_match:
agent = agent_match.group(1)
task_match = re.search(r"\('([^']+)'\)", delegate_action)
if task_match:
task = task_match.group(1)
else:
agent = delegate_action
task = thought
return AgentDelegateAction(
agent=agent, inputs={'task': task}, thought=thought
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue with this PR, just picking your brain on this: do we really have to set task in inputs? Can we, if not passing an actual MessageAction, at least make delegate action and obs a little more in-line with others, and for example use a field in AgentDelegateAction for the task, and use the content field in AgentDelegateObservation, which all obs have, for its result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really have to set task in inputs?

Some micro-agents needs more than just a task as an input. For example, coder-agent needs both "task" and "summary" of codebase - although one could argue "summary" could be added to the "task" prompt.

and use the content field in AgentDelegateObservation

Some micro-agents needs more than just a "content" as an output. E.g. CommitWriterAgent can either return outputs['answer'] if it generates a commit message, or outputs['reason'] if it rejects the request.

which all obs have, for its result

Fundamentally, AgentDelegateObservation is different from others. CmdOutputObservation, e.g., is essentially an observation of stdout/stderr - that being said, what if in the future, we need to catch other side-effects? Likely we would need to do something similar here - use outputs as a dict rather than content as a string.

)
else:
# We assume the LLM is GOOD enough that when it returns pure natural language
# it want to talk to the user
Expand Down
91 changes: 73 additions & 18 deletions agenthub/codeact_agent/prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,20 @@
Also, if a command execution result saying like: Command: "npm start" timed out. Sending SIGINT to the process, you should also retry with running the command in the background.
"""

BROWSING_PREFIX = """The assistant can browse the Internet with <execute_browse> and </execute_browse>.
For example, <execute_browse> Tell me the usa's president using google search </execute_browse>.
Or <execute_browse> Tell me what is in http://example.com </execute_browse>.
BROWSING_PREFIX = """The assistant can browse the Internet by wrapping goto("<URL>") command with <execute_browse> and </execute_browse>.
For example, you can browse a given URL by <execute_browse> goto("<URL>") </execute_browse>.
"""
PIP_INSTALL_PREFIX = """The assistant can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them."""

SYSTEM_PREFIX = MINIMAL_SYSTEM_PREFIX + BROWSING_PREFIX + PIP_INSTALL_PREFIX
DELEGATE_PREFIX = """The assistant can delegate a subtask to other specialized agents if it cannot achieve it by itself, by wrapping the agent name and task with <execute_delegate> and </execute_delegate>. The assistant only does so when it cannot use other actions to achieve the task. Available specialized agents include:
- BrowsingAgent: BrowsingAgent can do interactive browsing, including scrolling, locating specific elements, and clicking on buttons. For example, you can delegate a task to BrowsingAgent by <execute_delegate> BrowsingAgent('Use Google to find out the current president of USA') </execute_delegate>
"""

PIP_INSTALL_PREFIX = """The assistant can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them.
"""

SYSTEM_PREFIX = (
MINIMAL_SYSTEM_PREFIX + BROWSING_PREFIX + DELEGATE_PREFIX + PIP_INSTALL_PREFIX
)

GITHUB_MESSAGE = """To interact with GitHub, use the $GITHUB_TOKEN environment variable.
For example, to push a branch `my_branch` to the GitHub repo `owner/repo`:
Expand All @@ -38,8 +45,8 @@

SYSTEM_SUFFIX = """Responses should be concise.
The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block.
Include ONLY ONE <execute_ipython>, <execute_bash>, or <execute_browse> per response, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, or <execute_browse> whenever possible.
Include ONLY ONE <execute_ipython>, <execute_bash>, <execute_browse>, or <execute_delegate> per response, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, <execute_browse>, or <execute_delegate> whenever possible.
"""


Expand Down Expand Up @@ -169,7 +176,7 @@ def index():
ASSISTANT:
Sure! Let me browse the server's homepage at http://127.0.0.1:5000:
<execute_browse>
Get the content on "http://127.0.0.1:5000"
goto("http://127.0.0.1:5000")
</execute_browse>

USER:
Expand All @@ -179,7 +186,7 @@ def index():
ASSISTANT:
The content of the server's homepage is "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask!

USER: Now kill the server, make it display the numbers in a table format.
USER: Now kill the server, make it display the numbers only if a button is clicked.

ASSISTANT:
Sure! Let me stop the server first:
Expand Down Expand Up @@ -211,9 +218,20 @@ def index():
10| app.run(port=5000)

ASSISTANT:
I should edit the file to display the numbers in a table format. I should include correct indentation. Let me update the file:
I should edit the file to include a button. When the button is clicked, it will display the numbers. I should include correct indentation. Let me update the file:
<execute_ipython>
edit_file(start=7, end=7, content=" return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>'")
EDITED_CODE = \"\"\"def index():
return '''
<html>
<body>
<button onclick="window.location.href='/show_numbers'">Show Numbers</button>
</body>
</html>
'''

@app.route('/show_numbers')
def show_numbers():\"\"\"
edit_file(start=5, end=5, content=EDITED_CODE)
</execute_ipython>

USER:
Expand All @@ -224,11 +242,21 @@ def index():
3|
4|@app.route('/')
5|def index():
6| numbers = list(range(1, 11))
7| return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>'
8|
9|if __name__ == '__main__':
10| app.run(port=5000)
6| return '''
7| <html>
8| <body>
9| <button onclick="window.location.href='/show_numbers'">Show Numbers</button>
10| </body>
11| </html>
12| '''
13|
14|@app.route('/show_numbers')
15|def show_numbers():
16| numbers = list(range(1, 11))
17| return str(numbers)
18|
19|if __name__ == '__main__':
20| app.run(port=5000)
[File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.]

ASSISTANT:
Expand All @@ -242,7 +270,33 @@ def index():
[1] 126

ASSISTANT:
The server is running on port 5000 with PID 126. You can access the list of numbers in a table format by visiting http://127.0.0.1:5000. Let me know if you have any further requests!
The server is running on port 5000 with PID 126. You can access the list of numbers by visiting http://127.0.0.1:5000. Free free to let me know if you have any further requests!

USER: Now browse the newly started server's homepage and show me the numbers.

ASSISTANT:
Sure! Let me browse the server's homepage at http://127.0.0.1:5000:
<execute_browse>
goto("http://127.0.0.1:5000")
</execute_browse>

USER:
OBSERVATION:
button 'Show Numbers', clickable

ASSISTANT:
The numbers are hidden behind the button. There is no way I can see the answers unless I try clicking on that button. Since this is an interactive browsing activity, I should delegate to BrowsingAgent.
I need to tell BrowsingAgent the task it should complete, by passing the task description as a parameter.
<execute_delegate>
BrowsingAgent('visit http://127.0.0.1:5000 and tell me the numbers')
</execute_delegate>

USER:
OBSERVATION:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

ASSISTANT:
The numbers on the server's homepage are "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask!

--- END OF EXAMPLE ---
"""
Expand All @@ -251,5 +305,6 @@ def index():
"I don't understand your input. \n"
'For bash commands, use <execute_bash> YOUR_COMMAND </execute_bash>.\n'
'For Python code, use <execute_ipython> YOUR_CODE </execute_ipython>.\n'
'For browsing, use <execute_browse> YOUR_COMMAND </execute_browse>.\n'
'For browsing, use <execute_browse> goto(YOUR_URL) </execute_browse>.\n'
'For delegation, use <execute_delegate> YOUR_COMMAND </execute_delegate>.\n'
)
4 changes: 3 additions & 1 deletion opendevin/controller/agent_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,9 @@ async def start_delegate(self, action: AgentDelegateAction):
num_of_chars=self.state.num_of_chars,
delegate_level=self.state.delegate_level + 1,
)
logger.info(f'[Agent Controller {self.id}]: start delegate')
logger.info(
f'[Agent Controller {self.id}]: start delegating to agent {action.agent}'
)
self.delegate = AgentController(
sid=self.id + '-delegate',
agent=agent,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,12 @@ Important, however: do not run interactive commands. You do not have access to s
Also, you need to handle commands that may run indefinitely and not return a result. For such cases, you should redirect the output to a file and run the command in the background to avoid blocking the execution.
For example, to run a Python script that might run indefinitely without returning immediately, you can use the following format: <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
Also, if a command execution result saying like: Command: "npm start" timed out. Sending SIGINT to the process, you should also retry with running the command in the background.
The assistant can browse the Internet with <execute_browse> and </execute_browse>.
For example, <execute_browse> Tell me the usa's president using google search </execute_browse>.
Or <execute_browse> Tell me what is in http://example.com </execute_browse>.
The assistant can browse the Internet by wrapping goto("<URL>") command with <execute_browse> and </execute_browse>.
For example, you can browse a given URL by <execute_browse> goto("<URL>") </execute_browse>.
The assistant can delegate a subtask to other specialized agents if it cannot achieve it by itself, by wrapping the agent name and task with <execute_delegate> and </execute_delegate>. The assistant only does so when it cannot use other actions to achieve the task. Available specialized agents include:
- BrowsingAgent: BrowsingAgent can do interactive browsing, including scrolling, locating specific elements, and clicking on buttons. For example, you can delegate a task to BrowsingAgent by <execute_delegate> BrowsingAgent('Use Google to find out the current president of USA') </execute_delegate>
The assistant can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them.

To interact with GitHub, use the $GITHUB_TOKEN environment variable.
For example, to push a branch `my_branch` to the GitHub repo `owner/repo`:
<execute_bash> git push https://[email protected]/owner/repo.git my_branch </execute_bash>
Expand Down Expand Up @@ -101,8 +103,8 @@ Please note that THE `edit_file` FUNCTION REQUIRES PROPER INDENTATION. If the as

Responses should be concise.
The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block.
Include ONLY ONE <execute_ipython>, <execute_bash>, or <execute_browse> per response, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, or <execute_browse> whenever possible.
Include ONLY ONE <execute_ipython>, <execute_bash>, <execute_browse>, or <execute_delegate> per response, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, <execute_browse>, or <execute_delegate> whenever possible.


----------
Expand Down Expand Up @@ -233,7 +235,7 @@ USER: Now browse the newly started server's homepage and show me the content.
ASSISTANT:
Sure! Let me browse the server's homepage at http://127.0.0.1:5000:
<execute_browse>
Get the content on "http://127.0.0.1:5000"
goto("http://127.0.0.1:5000")
</execute_browse>

USER:
Expand All @@ -243,7 +245,7 @@ OBSERVATION:
ASSISTANT:
The content of the server's homepage is "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask!

USER: Now kill the server, make it display the numbers in a table format.
USER: Now kill the server, make it display the numbers only if a button is clicked.

ASSISTANT:
Sure! Let me stop the server first:
Expand Down Expand Up @@ -275,9 +277,20 @@ USER:
10| app.run(port=5000)

ASSISTANT:
I should edit the file to display the numbers in a table format. I should include correct indentation. Let me update the file:
I should edit the file to include a button. When the button is clicked, it will display the numbers. I should include correct indentation. Let me update the file:
<execute_ipython>
edit_file(start=7, end=7, content=" return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>'")
EDITED_CODE = """def index():
return '''
<html>
<body>
<button onclick="window.location.href='/show_numbers'">Show Numbers</button>
</body>
</html>
'''

@app.route('/show_numbers')
def show_numbers():"""
edit_file(start=5, end=5, content=EDITED_CODE)
</execute_ipython>

USER:
Expand All @@ -288,11 +301,21 @@ Observation:
3|
4|@app.route('/')
5|def index():
6| numbers = list(range(1, 11))
7| return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>'
8|
9|if __name__ == '__main__':
10| app.run(port=5000)
6| return '''
7| <html>
8| <body>
9| <button onclick="window.location.href='/show_numbers'">Show Numbers</button>
10| </body>
11| </html>
12| '''
13|
14|@app.route('/show_numbers')
15|def show_numbers():
16| numbers = list(range(1, 11))
17| return str(numbers)
18|
19|if __name__ == '__main__':
20| app.run(port=5000)
[File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.]

ASSISTANT:
Expand All @@ -306,7 +329,33 @@ Observation:
[1] 126

ASSISTANT:
The server is running on port 5000 with PID 126. You can access the list of numbers in a table format by visiting http://127.0.0.1:5000. Let me know if you have any further requests!
The server is running on port 5000 with PID 126. You can access the list of numbers by visiting http://127.0.0.1:5000. Free free to let me know if you have any further requests!

USER: Now browse the newly started server's homepage and show me the numbers.

ASSISTANT:
Sure! Let me browse the server's homepage at http://127.0.0.1:5000:
<execute_browse>
goto("http://127.0.0.1:5000")
</execute_browse>

USER:
OBSERVATION:
button 'Show Numbers', clickable

ASSISTANT:
The numbers are hidden behind the button. There is no way I can see the answers unless I try clicking on that button. Since this is an interactive browsing activity, I should delegate to BrowsingAgent.
I need to tell BrowsingAgent the task it should complete, by passing the task description as a parameter.
<execute_delegate>
BrowsingAgent('visit http://127.0.0.1:5000 and tell me the numbers')
</execute_delegate>

USER:
OBSERVATION:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

ASSISTANT:
The numbers on the server's homepage are "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask!

--- END OF EXAMPLE ---

Expand Down
Loading