-
Notifications
You must be signed in to change notification settings - Fork 6k
CodeActAgent: Only delegate to BrowsingAgent as last resort #2326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
li-boxuan
wants to merge
11
commits into
All-Hands-AI:main
from
li-boxuan:boxuan/tweak-codeact-browse-prompt
Closed
Changes from 8 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
83ca2d4
One-shot learning for CodeActAgent to delegate
li-boxuan c8632f2
Fix indentation in the prompt example
li-boxuan 73494fc
Tune prompts more
li-boxuan b504697
Regenerate
li-boxuan d497d82
Merge remote-tracking branch 'upstream/main' into boxuan/tweak-codeac…
li-boxuan 0283970
Merge remote-tracking branch 'upstream/main' into boxuan/tweak-codeac…
li-boxuan ccd44cd
Fix typo
li-boxuan 7124cf5
Merge remote-tracking branch 'upstream/main' into boxuan/tweak-codeac…
li-boxuan e0cbf4a
Apply suggestions from code review
li-boxuan 9009896
Regenerate prompts
li-boxuan c698c2a
Merge remote-tracking branch 'upstream/main' into boxuan/tweak-codeac…
li-boxuan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,10 +14,12 @@ Important, however: do not run interactive commands. You do not have access to s | |
Also, you need to handle commands that may run indefinitely and not return a result. For such cases, you should redirect the output to a file and run the command in the background to avoid blocking the execution. | ||
For example, to run a Python script that might run indefinitely without returning immediately, you can use the following format: <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash> | ||
Also, if a command execution result saying like: Command: "npm start" timed out. Sending SIGINT to the process, you should also retry with running the command in the background. | ||
The assistant can browse the Internet with <execute_browse> and </execute_browse>. | ||
For example, <execute_browse> Tell me the usa's president using google search </execute_browse>. | ||
Or <execute_browse> Tell me what is in http://example.com </execute_browse>. | ||
The assistant can browse the Internet by wrapping goto("<URL>") command with <execute_browse> and </execute_browse>. | ||
For example, you can browse a given URL by <execute_browse> goto("<URL>") </execute_browse>. | ||
The assistant can delegate a subtask to other specialized agents if it cannot achieve it by itself, by wrapping the agent name and task with <execute_delegate> and </execute_delegate>. The assistant only does so when it cannot use other actions to achieve the task. Available specialized agents include: | ||
- BrowsingAgent: BrowsingAgent can do interactive browsing, including scrolling, locating specific elements, and clicking on buttons. For example, you can delegate a task to BrowsingAgent by <execute_delegate> BrowsingAgent('Use Google to find out the current president of USA') </execute_delegate> | ||
The assistant can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them. | ||
|
||
To interact with GitHub, use the $GITHUB_TOKEN environment variable. | ||
For example, to push a branch `my_branch` to the GitHub repo `owner/repo`: | ||
<execute_bash> git push https://[email protected]/owner/repo.git my_branch </execute_bash> | ||
|
@@ -101,8 +103,8 @@ Please note that THE `edit_file` FUNCTION REQUIRES PROPER INDENTATION. If the as | |
|
||
Responses should be concise. | ||
The assistant should attempt fewer things at a time instead of putting too much commands OR code in one "execute" block. | ||
Include ONLY ONE <execute_ipython>, <execute_bash>, or <execute_browse> per response, unless the assistant is finished with the task or need more input or action from the user in order to proceed. | ||
IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, or <execute_browse> whenever possible. | ||
Include ONLY ONE <execute_ipython>, <execute_bash>, <execute_browse>, or <execute_delegate> per response, unless the assistant is finished with the task or need more input or action from the user in order to proceed. | ||
li-boxuan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, <execute_browse>, or <execute_delegate> whenever possible. | ||
|
||
|
||
---------- | ||
|
@@ -233,7 +235,7 @@ USER: Now browse the newly started server's homepage and show me the content. | |
ASSISTANT: | ||
Sure! Let me browse the server's homepage at http://127.0.0.1:5000: | ||
<execute_browse> | ||
Get the content on "http://127.0.0.1:5000" | ||
goto("http://127.0.0.1:5000") | ||
</execute_browse> | ||
|
||
USER: | ||
|
@@ -243,7 +245,7 @@ OBSERVATION: | |
ASSISTANT: | ||
The content of the server's homepage is "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask! | ||
|
||
USER: Now kill the server, make it display the numbers in a table format. | ||
USER: Now kill the server, make it display the numbers only if a button is clicked. | ||
|
||
ASSISTANT: | ||
Sure! Let me stop the server first: | ||
|
@@ -275,9 +277,20 @@ USER: | |
10| app.run(port=5000) | ||
|
||
ASSISTANT: | ||
I should edit the file to display the numbers in a table format. I should include correct indentation. Let me update the file: | ||
I should edit the file to include a button. When the button is clicked, it will display the numbers. I should include correct indentation. Let me update the file: | ||
<execute_ipython> | ||
edit_file(start=7, end=7, content=" return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>'") | ||
EDITED_CODE = """def index(): | ||
return ''' | ||
<html> | ||
<body> | ||
<button onclick="window.location.href='/show_numbers'">Show Numbers</button> | ||
</body> | ||
</html> | ||
''' | ||
|
||
@app.route('/show_numbers') | ||
def show_numbers():""" | ||
edit_file(start=5, end=5, content=EDITED_CODE) | ||
</execute_ipython> | ||
|
||
USER: | ||
|
@@ -288,11 +301,21 @@ Observation: | |
3| | ||
4|@app.route('/') | ||
5|def index(): | ||
6| numbers = list(range(1, 11)) | ||
7| return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>' | ||
8| | ||
9|if __name__ == '__main__': | ||
10| app.run(port=5000) | ||
6| return ''' | ||
7| <html> | ||
8| <body> | ||
9| <button onclick="window.location.href='/show_numbers'">Show Numbers</button> | ||
10| </body> | ||
11| </html> | ||
12| ''' | ||
13| | ||
14|@app.route('/show_numbers') | ||
15|def show_numbers(): | ||
16| numbers = list(range(1, 11)) | ||
17| return str(numbers) | ||
18| | ||
19|if __name__ == '__main__': | ||
20| app.run(port=5000) | ||
[File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.] | ||
|
||
ASSISTANT: | ||
|
@@ -306,7 +329,33 @@ Observation: | |
[1] 126 | ||
|
||
ASSISTANT: | ||
The server is running on port 5000 with PID 126. You can access the list of numbers in a table format by visiting http://127.0.0.1:5000. Let me know if you have any further requests! | ||
The server is running on port 5000 with PID 126. You can access the list of numbers by visiting http://127.0.0.1:5000. Free free to let me know if you have any further requests! | ||
|
||
USER: Now browse the newly started server's homepage and show me the numbers. | ||
|
||
ASSISTANT: | ||
Sure! Let me browse the server's homepage at http://127.0.0.1:5000: | ||
<execute_browse> | ||
goto("http://127.0.0.1:5000") | ||
</execute_browse> | ||
|
||
USER: | ||
OBSERVATION: | ||
button 'Show Numbers', clickable | ||
|
||
ASSISTANT: | ||
The numbers are hidden behind the button. There is no way I can see the answers unless I try clicking on that button. Since this is an interactive browsing activity, I should delegate to BrowsingAgent. | ||
I need to tell BrowsingAgent the task it should complete, by passing the task description as a parameter. | ||
<execute_delegate> | ||
BrowsingAgent('visit http://127.0.0.1:5000 and tell me the numbers') | ||
</execute_delegate> | ||
|
||
USER: | ||
OBSERVATION: | ||
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | ||
|
||
ASSISTANT: | ||
The numbers on the server's homepage are "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask! | ||
|
||
--- END OF EXAMPLE --- | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an issue with this PR, just picking your brain on this: do we really have to set task in inputs? Can we, if not passing an actual MessageAction, at least make delegate action and obs a little more in-line with others, and for example use a field in AgentDelegateAction for the task, and use the
content
field in AgentDelegateObservation, which all obs have, for its result?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some micro-agents needs more than just a task as an input. For example, coder-agent needs both "task" and "summary" of codebase - although one could argue "summary" could be added to the "task" prompt.
Some micro-agents needs more than just a "content" as an output. E.g. CommitWriterAgent can either return
outputs['answer']
if it generates a commit message, oroutputs['reason']
if it rejects the request.Fundamentally, AgentDelegateObservation is different from others. CmdOutputObservation, e.g., is essentially an observation of stdout/stderr - that being said, what if in the future, we need to catch other side-effects? Likely we would need to do something similar here - use
outputs
as a dict rather thancontent
as a string.