Skip to content

Commit a9a2f10

Browse files
authored
Revamp AgentRejectAction and allow ManagerAgent to handle rejection (#1735)
* Fix AgentRejectAction handling * Add ManagerAgent to integration tests * Fix regenerate.sh * Fix merge * Update README for micro-agents * Add test reject to regenerate.sh * regenerate.sh: Add support for running a specific test and/or agent * Refine reject schema, and allow ManagerAgent to handle reject * Add test artifacts for test_simple_task_rejection * Fix manager agent tests * Fix README * test_simple_task_rejection: check final agent state * Integration test: exit if mock prompt not found * Update test_simple_task_rejection tests * Fix test_edits test artifacts after prompt update * Fix ManagerAgent test_edits * WIP * Fix tests * update test_edits for ManagerAgent * Skip local sandbox for reject test * Fix test comparison
1 parent c062468 commit a9a2f10

36 files changed

+675
-18
lines changed

agenthub/micro/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,6 @@ in the following structure:
1212
Note that `prompt.md` could use jinja2 template syntax. During runtime, `prompt.md`
1313
is loaded and rendered, and used together with `agent.yaml` to initialize a
1414
micro-agent.
15+
16+
Micro-agents can be used independently. You can also use `ManagerAgent` which knows
17+
how to coordinate the agents and collaboratively finish a task.
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
* `reject` - reject the task. Arguments:
2-
* `outputs` - a dictionary representing the outputs of your task, if any
2+
* `outputs` - a dictionary with only a `reason` attribute

agenthub/micro/commit_writer/agent.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ description: "Write a git commit message for files in the git staging area"
33
inputs: {}
44
outputs:
55
answer: string
6+
reason: string

agenthub/micro/commit_writer/prompt.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ changes. The commit message should include:
1414
You should find the diff using `git diff --cached`, compile a commit message,
1515
and call the `finish` action with `outputs.answer` set to the answer. If current
1616
repo is not a valid git repo, or there is no diff in the staging area, please call
17-
the `reject` action with `outputs.answer` set to the reason.
17+
the `reject` action.
1818

1919
## History
2020
{{ instructions.history_truncated }}

agenthub/micro/manager/agent.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,6 @@ description: Delegates tasks to microagents based on their area of expertise
33
generates: Action
44
inputs:
55
task: string
6-
outputs: {}
6+
outputs:
7+
summary: string # if finished
8+
reason: string # if rejected

agenthub/micro/manager/prompt.md

+13
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,15 @@ can do the actual work. A description of each agent is provided below. You MUST
77
select one of the delegates below to move towards accomplishing the task, and you MUST
88
provide the correct inputs for the delegate you select.
99

10+
Note: the delegated agent either returns "finish" or "reject".
11+
- If the action is "finish", but the full task is not done yet, you should
12+
continue to delegate to one of the agents below to until the full task is finished.
13+
- If the action is "reject", it means the delegated agent is not capable of the
14+
task you send to. You should revisit the input you send to the delegate, and consider
15+
whether any other delegate would be able to solve the task. If you cannot find
16+
a proper delegate agent, or the delegate attempts keep failing, call the `reject`
17+
action.
18+
1019
## Agents
1120
{% for name, details in delegates.items() %}
1221
### {{ name }}
@@ -19,9 +28,13 @@ provide the correct inputs for the delegate you select.
1928
{{ instructions.history_truncated }}
2029
{{ history_to_json(state.history[-10:]) }}
2130

31+
If the last item in the history is an error, you should try to fix it. If you
32+
cannot fix it, call the `reject` action.
33+
2234
## Available Actions
2335
{{ instructions.actions.delegate }}
2436
{{ instructions.actions.finish }}
37+
{{ instructions.actions.reject }}
2538

2639
## Format
2740
{{ instructions.format.action }}

opendevin/controller/agent_controller.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
AddTaskAction,
2020
AgentDelegateAction,
2121
AgentFinishAction,
22+
AgentRejectAction,
2223
ChangeAgentStateAction,
2324
MessageAction,
2425
ModifyTaskAction,
@@ -164,6 +165,9 @@ async def on_event(self, event: Event):
164165
elif isinstance(event, AgentFinishAction):
165166
self.state.outputs = event.outputs # type: ignore[attr-defined]
166167
await self.set_agent_state_to(AgentState.FINISHED)
168+
elif isinstance(event, AgentRejectAction):
169+
self.state.outputs = event.outputs # type: ignore[attr-defined]
170+
await self.set_agent_state_to(AgentState.REJECTED)
167171
elif isinstance(event, Observation):
168172
if self._pending_action and self._pending_action.id == event.cause:
169173
await self.add_history(self._pending_action, event)
@@ -252,7 +256,7 @@ async def _step(self):
252256
# propagate error state until an agent or user can handle it
253257
await self.set_agent_state_to(AgentState.ERROR)
254258
return
255-
delegate_done = delegate_state == AgentState.FINISHED
259+
delegate_done = delegate_state in (AgentState.FINISHED, AgentState.REJECTED)
256260
if delegate_done:
257261
logger.info(
258262
f'[Agent Controller {self.id}] Delegate agent has finished execution'

opendevin/core/main.py

+1
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ async def on_event(event: Event):
127127
event_stream.subscribe(EventStreamSubscriber.MAIN, on_event)
128128
while controller.get_agent_state() not in [
129129
AgentState.FINISHED,
130+
AgentState.REJECTED,
130131
AgentState.ERROR,
131132
AgentState.PAUSED,
132133
AgentState.STOPPED,

opendevin/core/schema/agent.py

+4
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,10 @@ class AgentState(str, Enum):
3030
"""The agent is finished with the current task.
3131
"""
3232

33+
REJECTED = 'rejected'
34+
"""The agent rejects the task.
35+
"""
36+
3337
ERROR = 'error'
3438
"""An error occurred during the task.
3539
"""

tests/integration/README.md

+9
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@ TEST_ONLY=true ./tests/integration/regenerate.sh
5555

5656
to run all integration tests until the first failure.
5757

58+
If you only want to run a specific test, set environment variable
59+
`ONLY_TEST_NAME` to the test name. If you only want to run a specific agent,
60+
set environment variable `ONLY_TEST_AGENT` to the agent. You could also use both,
61+
e.g.
62+
63+
```bash
64+
TEST_ONLY=true ONLY_TEST_NAME="test_simple_task_rejection" ONLY_TEST_AGENT="ManagerAgent" ./tests/integration/regenerate.sh
65+
```
66+
5867

5968
## Regenerate Integration Tests
6069
When you make changes to an agent's prompt, the integration tests will fail. You'll need to regenerate them

tests/integration/conftest.py

+11-4
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
import io
22
import os
33
import re
4+
import subprocess
45
import sys
56
import tempfile
6-
import subprocess
77
from functools import partial
88
from http.server import HTTPServer, SimpleHTTPRequestHandler
99
from threading import Thread
@@ -18,7 +18,8 @@
1818

1919

2020
def filter_out_symbols(input):
21-
return ' '.join([char for char in input if char.isalnum()])
21+
input = re.sub(r'\\n|\\r\\n|\\r|\s+', '', input)
22+
return input
2223

2324

2425
def get_log_id(prompt_log_name):
@@ -84,13 +85,19 @@ def get_mock_response(test_name: str, messages: str, id: int) -> str:
8485
print('Mismatched Prompt File path', prompt_file_path)
8586
print('---' * 10)
8687
# Create a temporary file to store messages
87-
with tempfile.NamedTemporaryFile(delete=False, mode='w', encoding='utf-8') as tmp_file:
88+
with tempfile.NamedTemporaryFile(
89+
delete=False, mode='w', encoding='utf-8'
90+
) as tmp_file:
8891
tmp_file_path = tmp_file.name
8992
tmp_file.write(messages)
9093

9194
try:
9295
# Use diff command to compare files and capture the output
93-
result = subprocess.run(['diff', '-u', prompt_file_path, tmp_file_path], capture_output=True, text=True)
96+
result = subprocess.run(
97+
['diff', '-u', prompt_file_path, tmp_file_path],
98+
capture_output=True,
99+
text=True,
100+
)
94101
if result.returncode != 0:
95102
print('Diff:')
96103
print(result.stdout)

tests/integration/mock/ManagerAgent/test_edits/prompt_001.log

+15
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,15 @@ can do the actual work. A description of each agent is provided below. You MUST
1111
select one of the delegates below to move towards accomplishing the task, and you MUST
1212
provide the correct inputs for the delegate you select.
1313

14+
Note: the delegated agent either returns "finish" or "reject".
15+
- If the action is "finish", but the full task is not done yet, you should
16+
continue to delegate to one of the agents below to until the full task is finished.
17+
- If the action is "reject", it means the delegated agent is not capable of the
18+
task you send to. You should revisit the input you send to the delegate, and consider
19+
whether any other delegate would be able to solve the task. If you cannot find
20+
a proper delegate agent, or the delegate attempts keep failing, call the `reject`
21+
action.
22+
1423
## Agents
1524

1625
### CoderAgent
@@ -62,6 +71,9 @@ They are time-ordered, with your most recent action at the bottom.
6271

6372
[[{"source": "user", "action": "message", "args": {"content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]
6473

74+
If the last item in the history is an error, you should try to fix it. If you
75+
cannot fix it, call the `reject` action.
76+
6577
## Available Actions
6678
* `delegate` - send a task to another agent from the list provided. Arguments:
6779
* `agent` - the agent to which the task is delegated. MUST match a name in the list of agents provided.
@@ -70,6 +82,9 @@ They are time-ordered, with your most recent action at the bottom.
7082
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
7183
* `outputs` - a dictionary representing the outputs of your task, if any
7284

85+
* `reject` - reject the task. Arguments:
86+
* `outputs` - a dictionary with only a `reason` attribute
87+
7388

7489
## Format
7590
Your response MUST be in JSON format. It must be an object, and it must contain two fields:

tests/integration/mock/ManagerAgent/test_edits/prompt_004.log

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ as well as observations you've made. This only includes the MOST RECENT
5252
actions and observations--more may have happened before that.
5353
They are time-ordered, with your most recent action at the bottom.
5454

55-
[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]
55+
[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]
5656

5757
## Format
5858
Your response MUST be in JSON format. It must be an object, and it must contain two fields:

tests/integration/mock/ManagerAgent/test_edits/prompt_005.log

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ as well as observations you've made. This only includes the MOST RECENT
5252
actions and observations--more may have happened before that.
5353
They are time-ordered, with your most recent action at the bottom.
5454

55-
[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "write", "args": {"path": "./bad.txt", "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "write", "content": "", "extras": {"path": "./bad.txt"}}]]
55+
[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "write", "args": {"path": "./bad.txt", "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "write", "content": "", "extras": {"path": "./bad.txt"}}]]
5656

5757
## Format
5858
Your response MUST be in JSON format. It must be an object, and it must contain two fields:

tests/integration/mock/ManagerAgent/test_edits/prompt_006.log

+16-1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,15 @@ can do the actual work. A description of each agent is provided below. You MUST
1111
select one of the delegates below to move towards accomplishing the task, and you MUST
1212
provide the correct inputs for the delegate you select.
1313

14+
Note: the delegated agent either returns "finish" or "reject".
15+
- If the action is "finish", but the full task is not done yet, you should
16+
continue to delegate to one of the agents below to until the full task is finished.
17+
- If the action is "reject", it means the delegated agent is not capable of the
18+
task you send to. You should revisit the input you send to the delegate, and consider
19+
whether any other delegate would be able to solve the task. If you cannot find
20+
a proper delegate agent, or the delegate attempts keep failing, call the `reject`
21+
action.
22+
1423
## Agents
1524

1625
### CoderAgent
@@ -60,7 +69,10 @@ as well as observations you've made. This only includes the MOST RECENT
6069
actions and observations--more may have happened before that.
6170
They are time-ordered, with your most recent action at the bottom.
6271

63-
[[{"source": "user", "action": "message", "args": {"content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "delegate", "args": {"agent": "TypoFixerAgent", "inputs": {"task": "Fix typos in bad.txt"}, "thought": ""}}, {"observation": "null", "content": "", "extras": {}}], [{"action": "null", "args": {}}, {"source": "agent", "observation": "delegate", "content": "", "extras": {"outputs": {"summary": {"file": "./bad.txt", "typos_fixed": [{"original": "typoo", "fixed": "typo"}, {"original": "mor", "fixed": "more"}]}}}}]]
72+
[[{"source": "user", "action": "message", "args": {"content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "delegate", "args": {"agent": "TypoFixerAgent", "inputs": {"task": "Fix typos in bad.txt"}, "thought": ""}}, {"observation": "null", "content": "", "extras": {}}], [{"action": "null", "args": {}}, {"source": "agent", "observation": "delegate", "content": "", "extras": {"outputs": {"summary": {"file": "bad.txt", "typos_fixed": [{"original": "typoo", "corrected": "typo"}, {"original": "mor", "corrected": "more"}]}}}}]]
73+
74+
If the last item in the history is an error, you should try to fix it. If you
75+
cannot fix it, call the `reject` action.
6476

6577
## Available Actions
6678
* `delegate` - send a task to another agent from the list provided. Arguments:
@@ -70,6 +82,9 @@ They are time-ordered, with your most recent action at the bottom.
7082
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
7183
* `outputs` - a dictionary representing the outputs of your task, if any
7284

85+
* `reject` - reject the task. Arguments:
86+
* `outputs` - a dictionary with only a `reason` attribute
87+
7388

7489
## Format
7590
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"action": "message",
33
"args": {
4-
"content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.",
4+
"content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.",
55
"wait_for_response": false
66
}
77
}

tests/integration/mock/ManagerAgent/test_edits/response_005.log

+3-3
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33
"args": {
44
"outputs": {
55
"summary": {
6-
"file": "./bad.txt",
6+
"file": "bad.txt",
77
"typos_fixed": [
88
{
99
"original": "typoo",
10-
"fixed": "typo"
10+
"corrected": "typo"
1111
},
1212
{
1313
"original": "mor",
14-
"fixed": "more"
14+
"corrected": "more"
1515
}
1616
]
1717
}

tests/integration/mock/ManagerAgent/test_edits/response_006.log

+3-3
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33
"action": "finish",
44
"args": {
55
"outputs": {
6-
"file": "./bad.txt",
6+
"file": "bad.txt",
77
"typos_fixed": [
88
{
99
"original": "typoo",
10-
"fixed": "typo"
10+
"corrected": "typo"
1111
},
1212
{
1313
"original": "mor",
14-
"fixed": "more"
14+
"corrected": "more"
1515
}
1616
]
1717
}

0 commit comments

Comments
 (0)