Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp AgentRejectAction and allow ManagerAgent to handle rejection #1735

Merged
merged 37 commits into from
Jun 9, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
479a78f
Fix AgentRejectAction handling
li-boxuan May 12, 2024
2eddf6f
Add ManagerAgent to integration tests
li-boxuan May 12, 2024
16ddbdb
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan May 12, 2024
4fb56e9
Fix regenerate.sh
li-boxuan May 12, 2024
224069d
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan May 12, 2024
c612c02
Fix merge
li-boxuan May 13, 2024
d9b1b4b
Update README for micro-agents
li-boxuan May 13, 2024
342033c
Add test reject to regenerate.sh
li-boxuan May 13, 2024
77c383a
regenerate.sh: Add support for running a specific test and/or agent
li-boxuan May 13, 2024
c3565c5
Refine reject schema, and allow ManagerAgent to handle reject
li-boxuan May 13, 2024
aacc7ec
Add test artifacts for test_simple_task_rejection
li-boxuan May 13, 2024
99233cd
Fix manager agent tests
li-boxuan May 13, 2024
9f62fca
Fix README
li-boxuan May 13, 2024
fc63c05
test_simple_task_rejection: check final agent state
li-boxuan May 13, 2024
c466c1c
Integration test: exit if mock prompt not found
li-boxuan May 13, 2024
a709edc
Update test_simple_task_rejection tests
li-boxuan May 13, 2024
a72d37c
Merge branch 'main' into reject-action
li-boxuan May 13, 2024
7d64a8a
Fix test_edits test artifacts after prompt update
li-boxuan May 13, 2024
a6b1144
Merge remote-tracking branch 'origin/reject-action' into reject-action
li-boxuan May 13, 2024
3d6054b
Merge branch 'main' into reject-action
li-boxuan May 13, 2024
7db9109
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan May 14, 2024
c242f29
Fix ManagerAgent test_edits
li-boxuan May 14, 2024
bd810d8
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan May 15, 2024
b4d6309
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan May 16, 2024
dd56bd7
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan May 18, 2024
073e440
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan May 31, 2024
3dd7cc2
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan Jun 1, 2024
9b99f01
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan Jun 1, 2024
8566678
WIP
li-boxuan Jun 1, 2024
b7c1f22
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan Jun 1, 2024
e1931fd
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan Jun 1, 2024
e9734e5
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan Jun 4, 2024
9dfa624
Fix tests
li-boxuan Jun 4, 2024
707d133
update test_edits for ManagerAgent
li-boxuan Jun 4, 2024
1243d6d
Skip local sandbox for reject test
li-boxuan Jun 4, 2024
48d81d5
Merge remote-tracking branch 'upstream/main' into reject-action
li-boxuan Jun 9, 2024
07a104c
Fix test comparison
li-boxuan Jun 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions agenthub/micro/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,6 @@ in the following structure:
Note that `prompt.md` could use jinja2 template syntax. During runtime, `prompt.md`
is loaded and rendered, and used together with `agent.yaml` to initialize a
micro-agent.

Micro-agents can be used independently. You can also use `ManagerAgent` which knows
how to coordinate the agents and collaboratively finish a task.
2 changes: 1 addition & 1 deletion agenthub/micro/_instructions/actions/reject.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
* `reject` - reject the task. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any
* `outputs` - a dictionary with only a `reason` attribute
1 change: 1 addition & 0 deletions agenthub/micro/commit_writer/agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ description: "Write a git commit message for files in the git staging area"
inputs: {}
outputs:
answer: string
reason: string
2 changes: 1 addition & 1 deletion agenthub/micro/commit_writer/prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ changes. The commit message should include:
You should find the diff using `git diff --cached`, compile a commit message,
and call the `finish` action with `outputs.answer` set to the answer. If current
repo is not a valid git repo, or there is no diff in the staging area, please call
the `reject` action with `outputs.answer` set to the reason.
the `reject` action.

## History
{{ instructions.history_truncated }}
Expand Down
4 changes: 3 additions & 1 deletion agenthub/micro/manager/agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@ description: Delegates tasks to microagents based on their area of expertise
generates: Action
inputs:
task: string
outputs: {}
outputs:
summary: string # if finished
reason: string # if rejected
13 changes: 13 additions & 0 deletions agenthub/micro/manager/prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ can do the actual work. A description of each agent is provided below. You MUST
select one of the delegates below to move towards accomplishing the task, and you MUST
provide the correct inputs for the delegate you select.

Note: the delegated agent either returns "finish" or "reject".
- If the action is "finish", but the full task is not done yet, you should
continue to delegate to one of the agents below to until the full task is finished.
- If the action is "reject", it means the delegated agent is not capable of the
task you send to. You should revisit the input you send to the delegate, and consider
whether any other delegate would be able to solve the task. If you cannot find
a proper delegate agent, or the delegate attempts keep failing, call the `reject`
action.

## Agents
{% for name, details in delegates.items() %}
### {{ name }}
Expand All @@ -19,9 +28,13 @@ provide the correct inputs for the delegate you select.
{{ instructions.history_truncated }}
{{ history_to_json(state.history[-10:]) }}

If the last item in the history is an error, you should try to fix it. If you
cannot fix it, call the `reject` action.

## Available Actions
{{ instructions.actions.delegate }}
{{ instructions.actions.finish }}
{{ instructions.actions.reject }}

## Format
{{ instructions.format.action }}
6 changes: 5 additions & 1 deletion opendevin/controller/agent_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
AddTaskAction,
AgentDelegateAction,
AgentFinishAction,
AgentRejectAction,
ChangeAgentStateAction,
MessageAction,
ModifyTaskAction,
Expand Down Expand Up @@ -164,6 +165,9 @@ async def on_event(self, event: Event):
elif isinstance(event, AgentFinishAction):
self.state.outputs = event.outputs # type: ignore[attr-defined]
await self.set_agent_state_to(AgentState.FINISHED)
elif isinstance(event, AgentRejectAction):
self.state.outputs = event.outputs # type: ignore[attr-defined]
await self.set_agent_state_to(AgentState.REJECTED)
elif isinstance(event, Observation):
if self._pending_action and self._pending_action.id == event.cause:
await self.add_history(self._pending_action, event)
Expand Down Expand Up @@ -252,7 +256,7 @@ async def _step(self):
# propagate error state until an agent or user can handle it
await self.set_agent_state_to(AgentState.ERROR)
return
delegate_done = delegate_state == AgentState.FINISHED
delegate_done = delegate_state in (AgentState.FINISHED, AgentState.REJECTED)
if delegate_done:
logger.info(
f'[Agent Controller {self.id}] Delegate agent has finished execution'
Expand Down
1 change: 1 addition & 0 deletions opendevin/core/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ async def on_event(event: Event):
event_stream.subscribe(EventStreamSubscriber.MAIN, on_event)
while controller.get_agent_state() not in [
AgentState.FINISHED,
AgentState.REJECTED,
AgentState.ERROR,
AgentState.PAUSED,
AgentState.STOPPED,
Expand Down
4 changes: 4 additions & 0 deletions opendevin/core/schema/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ class AgentState(str, Enum):
"""The agent is finished with the current task.
"""

REJECTED = 'rejected'
"""The agent rejects the task.
"""

ERROR = 'error'
"""An error occurred during the task.
"""
9 changes: 9 additions & 0 deletions tests/integration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,15 @@ TEST_ONLY=true ./tests/integration/regenerate.sh

to run all integration tests until the first failure.

If you only want to run a specific test, set environment variable
`ONLY_TEST_NAME` to the test name. If you only want to run a specific agent,
set environment variable `ONLY_TEST_AGENT` to the agent. You could also use both,
e.g.

```bash
TEST_ONLY=true ONLY_TEST_NAME="test_simple_task_rejection" ONLY_TEST_AGENT="ManagerAgent" ./tests/integration/regenerate.sh
```


## Regenerate Integration Tests
When you make changes to an agent's prompt, the integration tests will fail. You'll need to regenerate them
Expand Down
15 changes: 15 additions & 0 deletions tests/integration/mock/ManagerAgent/test_edits/prompt_001.log
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,15 @@ can do the actual work. A description of each agent is provided below. You MUST
select one of the delegates below to move towards accomplishing the task, and you MUST
provide the correct inputs for the delegate you select.

Note: the delegated agent either returns "finish" or "reject".
- If the action is "finish", but the full task is not done yet, you should
continue to delegate to one of the agents below to until the full task is finished.
- If the action is "reject", it means the delegated agent is not capable of the
task you send to. You should revisit the input you send to the delegate, and consider
whether any other delegate would be able to solve the task. If you cannot find
a proper delegate agent, or the delegate attempts keep failing, call the `reject`
action.

## Agents

### CoderAgent
Expand Down Expand Up @@ -62,6 +71,9 @@ They are time-ordered, with your most recent action at the bottom.

[[{"source": "user", "action": "message", "args": {"content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]

If the last item in the history is an error, you should try to fix it. If you
cannot fix it, call the `reject` action.

## Available Actions
* `delegate` - send a task to another agent from the list provided. Arguments:
* `agent` - the agent to which the task is delegated. MUST match a name in the list of agents provided.
Expand All @@ -70,6 +82,9 @@ They are time-ordered, with your most recent action at the bottom.
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any

* `reject` - reject the task. Arguments:
* `outputs` - a dictionary with only a `reason` attribute


## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.

[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]
[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]

## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.

[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "write", "args": {"path": "./bad.txt", "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "write", "content": "", "extras": {"path": "./bad.txt"}}]]
[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "write", "args": {"path": "./bad.txt", "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "write", "content": "", "extras": {"path": "./bad.txt"}}]]

## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
Expand Down
17 changes: 16 additions & 1 deletion tests/integration/mock/ManagerAgent/test_edits/prompt_006.log
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,15 @@ can do the actual work. A description of each agent is provided below. You MUST
select one of the delegates below to move towards accomplishing the task, and you MUST
provide the correct inputs for the delegate you select.

Note: the delegated agent either returns "finish" or "reject".
- If the action is "finish", but the full task is not done yet, you should
continue to delegate to one of the agents below to until the full task is finished.
- If the action is "reject", it means the delegated agent is not capable of the
task you send to. You should revisit the input you send to the delegate, and consider
whether any other delegate would be able to solve the task. If you cannot find
a proper delegate agent, or the delegate attempts keep failing, call the `reject`
action.

## Agents

### CoderAgent
Expand Down Expand Up @@ -60,7 +69,10 @@ as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.

[[{"source": "user", "action": "message", "args": {"content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "delegate", "args": {"agent": "TypoFixerAgent", "inputs": {"task": "Fix typos in bad.txt"}, "thought": ""}}, {"observation": "null", "content": "", "extras": {}}], [{"action": "null", "args": {}}, {"source": "agent", "observation": "delegate", "content": "", "extras": {"outputs": {"summary": {"file": "./bad.txt", "typos_fixed": [{"original": "typoo", "fixed": "typo"}, {"original": "mor", "fixed": "more"}]}}}}]]
[[{"source": "user", "action": "message", "args": {"content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "delegate", "args": {"agent": "TypoFixerAgent", "inputs": {"task": "Fix typos in bad.txt"}, "thought": ""}}, {"observation": "null", "content": "", "extras": {}}], [{"action": "null", "args": {}}, {"source": "agent", "observation": "delegate", "content": "", "extras": {"outputs": {"summary": {"file": "bad.txt", "typos_fixed": [{"original": "typoo", "corrected": "typo"}, {"original": "mor", "corrected": "more"}]}}}}]]

If the last item in the history is an error, you should try to fix it. If you
cannot fix it, call the `reject` action.

## Available Actions
* `delegate` - send a task to another agent from the list provided. Arguments:
Expand All @@ -70,6 +82,9 @@ They are time-ordered, with your most recent action at the bottom.
* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any

* `reject` - reject the task. Arguments:
* `outputs` - a dictionary with only a `reason` attribute


## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"action": "message",
"args": {
"content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.",
"content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.",
"wait_for_response": false
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
"args": {
"outputs": {
"summary": {
"file": "./bad.txt",
"file": "bad.txt",
"typos_fixed": [
{
"original": "typoo",
"fixed": "typo"
"corrected": "typo"
},
{
"original": "mor",
"fixed": "more"
"corrected": "more"
}
]
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
"action": "finish",
"args": {
"outputs": {
"file": "./bad.txt",
"file": "bad.txt",
"typos_fixed": [
{
"original": "typoo",
"fixed": "typo"
"corrected": "typo"
},
{
"original": "mor",
"fixed": "more"
"corrected": "more"
}
]
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@


----------

# Task
You are in charge of accomplishing the following task:
Write a git commit message for the current staging area. Do not ask me for confirmation at any point.

In order to accomplish this goal, you must delegate tasks to one or more agents, who
can do the actual work. A description of each agent is provided below. You MUST
select one of the delegates below to move towards accomplishing the task, and you MUST
provide the correct inputs for the delegate you select.

Note: the delegated agent either returns "finish" or "reject".
- If the action is "finish", but the full task is not done yet, you should
continue to delegate to one of the agents below to until the full task is finished.
- If the action is "reject", it means the delegated agent is not capable of the
task you send to. You should revisit the input you send to the delegate, and consider
whether any other delegate would be able to solve the task. If you cannot find
a proper delegate agent, or the delegate attempts keep failing, call the `reject`
action.

## Agents

### CoderAgent
Given a particular task, and a detailed description of the codebase, accomplishes the task
#### Inputs
{"task": "string", "summary": "string"}

### CommitWriterAgent
Write a git commit message for files in the git staging area
#### Inputs
{}

### MathAgent
Solves simple and complex math problems using python
#### Inputs
{"task": "string"}

### PostgresAgent
Writes and maintains PostgreSQL migrations
#### Inputs
{"task": "string"}

### RepoExplorerAgent
Generates a detailed summary of an existing codebase
#### Inputs
{}

### StudyRepoForTaskAgent
Given a particular task, finds and describes all relevant parts of the codebase
#### Inputs
{"task": "string"}

### TypoFixerAgent
Fixes typos in files in the current working directory
#### Inputs
{"task": "string"}

### VerifierAgent
Given a particular task, verifies that the task has been completed
#### Inputs
{"task": "string"}


## History
Here is a recent history of actions you've taken in service of this plan,
as well as observations you've made. This only includes the MOST RECENT
actions and observations--more may have happened before that.
They are time-ordered, with your most recent action at the bottom.

[[{"source": "user", "action": "message", "args": {"content": "Write a git commit message for the current staging area. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]

If the last item in the history is an error, you should try to fix it. If you
cannot fix it, call the `reject` action.

## Available Actions
* `delegate` - send a task to another agent from the list provided. Arguments:
* `agent` - the agent to which the task is delegated. MUST match a name in the list of agents provided.
* `inputs` - a dictionary of input parameters to the agent, as specified in the list

* `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
* `outputs` - a dictionary representing the outputs of your task, if any

* `reject` - reject the task. Arguments:
* `outputs` - a dictionary with only a `reason` attribute


## Format
Your response MUST be in JSON format. It must be an object, and it must contain two fields:
* `action`, which is one of the actions specified here
* `args`, which is a map of key-value pairs, specifying the arguments for that action

You MUST NOT include any other text besides the JSON response
Loading