All-Hands-AI
diff --git a/‎agenthub/micro/README.md
+3 b/‎agenthub/micro/README.md
+3
diff --git a/‎agenthub/micro/_instructions/actions/reject.md
+1-1 b/‎agenthub/micro/_instructions/actions/reject.md
+1-1
diff --git a/‎agenthub/micro/commit_writer/agent.yaml
+1 b/‎agenthub/micro/commit_writer/agent.yaml
+1
diff --git a/‎agenthub/micro/commit_writer/prompt.md
+1-1 b/‎agenthub/micro/commit_writer/prompt.md
+1-1
diff --git a/‎agenthub/micro/manager/agent.yaml
+3-1 b/‎agenthub/micro/manager/agent.yaml
+3-1
diff --git a/‎agenthub/micro/manager/prompt.md
+13 b/‎agenthub/micro/manager/prompt.md
+13
diff --git a/‎opendevin/controller/agent_controller.py
+5-1 b/‎opendevin/controller/agent_controller.py
+5-1
diff --git a/‎opendevin/core/main.py
+1 b/‎opendevin/core/main.py
+1
diff --git a/‎opendevin/core/schema/agent.py
+4 b/‎opendevin/core/schema/agent.py
+4
diff --git a/‎tests/integration/README.md
+9 b/‎tests/integration/README.md
+9
diff --git a/‎tests/integration/conftest.py
+11-4 b/‎tests/integration/conftest.py
+11-4
diff --git a/‎tests/integration/mock/ManagerAgent/test_edits/prompt_001.log
+15 b/‎tests/integration/mock/ManagerAgent/test_edits/prompt_001.log
+15
diff --git a/‎tests/integration/mock/ManagerAgent/test_edits/prompt_004.log
+1-1 b/‎tests/integration/mock/ManagerAgent/test_edits/prompt_004.log
+1-1
diff --git a/‎tests/integration/mock/ManagerAgent/test_edits/prompt_005.log
+1-1 b/‎tests/integration/mock/ManagerAgent/test_edits/prompt_005.log
+1-1
diff --git a/‎tests/integration/mock/ManagerAgent/test_edits/prompt_006.log
+16-1 b/‎tests/integration/mock/ManagerAgent/test_edits/prompt_006.log
+16-1
diff --git a/‎tests/integration/mock/ManagerAgent/test_edits/response_003.log
+1-1 b/‎tests/integration/mock/ManagerAgent/test_edits/response_003.log
+1-1
diff --git a/‎tests/integration/mock/ManagerAgent/test_edits/response_005.log
+3-3 b/‎tests/integration/mock/ManagerAgent/test_edits/response_005.log
+3-3
diff --git a/‎tests/integration/mock/ManagerAgent/test_edits/response_006.log
+3-3 b/‎tests/integration/mock/ManagerAgent/test_edits/response_006.log
+3-3
@@ -12,3 +12,6 @@ in the following structure:
 Note that `prompt.md` could use jinja2 template syntax. During runtime, `prompt.md`
 is loaded and rendered, and used together with `agent.yaml` to initialize a
 micro-agent.
+
+Micro-agents can be used independently. You can also use `ManagerAgent` which knows
+how to coordinate the agents and collaboratively finish a task.
@@ -1,2 +1,2 @@
 * `reject` - reject the task. Arguments:
-  * `outputs` - a dictionary representing the outputs of your task, if any
+  * `outputs` - a dictionary with only a `reason` attribute
@@ -3,3 +3,4 @@ description: "Write a git commit message for files in the git staging area"
 inputs: {}
 outputs:
   answer: string
+  reason: string
@@ -14,7 +14,7 @@ changes. The commit message should include:
 You should find the diff using `git diff --cached`, compile a commit message,
 and call the `finish` action with `outputs.answer` set to the answer. If current
 repo is not a valid git repo, or there is no diff in the staging area, please call
-the `reject` action with `outputs.answer` set to the reason.
+the `reject` action.
 
 ## History
 {{ instructions.history_truncated }}
 
@@ -3,4 +3,6 @@ description: Delegates tasks to microagents based on their area of expertise
 generates: Action
 inputs:
   task: string
-outputs: {}
+outputs:
+  summary: string # if finished
+  reason: string # if rejected
@@ -7,6 +7,15 @@ can do the actual work. A description of each agent is provided below. You MUST
 select one of the delegates below to move towards accomplishing the task, and you MUST
 provide the correct inputs for the delegate you select.
 
+Note: the delegated agent either returns "finish" or "reject".
+- If the action is "finish", but the full task is not done yet, you should
+continue to delegate to one of the agents below to until the full task is finished.
+- If the action is "reject", it means the delegated agent is not capable of the
+task you send to. You should revisit the input you send to the delegate, and consider
+whether any other delegate would be able to solve the task. If you cannot find
+a proper delegate agent, or the delegate attempts keep failing, call the `reject`
+action.
+
 ## Agents
 {% for name, details in delegates.items() %}
 ### {{ name }}
@@ -19,9 +28,13 @@ provide the correct inputs for the delegate you select.
 {{ instructions.history_truncated }}
 {{ history_to_json(state.history[-10:]) }}
 
+If the last item in the history is an error, you should try to fix it. If you
+cannot fix it, call the `reject` action.
+
 ## Available Actions
 {{ instructions.actions.delegate }}
 {{ instructions.actions.finish }}
+{{ instructions.actions.reject }}
 
 ## Format
 {{ instructions.format.action }}
@@ -19,6 +19,7 @@
     AddTaskAction,
     AgentDelegateAction,
     AgentFinishAction,
+    AgentRejectAction,
     ChangeAgentStateAction,
     MessageAction,
     ModifyTaskAction,
@@ -164,6 +165,9 @@ async def on_event(self, event: Event):
         elif isinstance(event, AgentFinishAction):
             self.state.outputs = event.outputs  # type: ignore[attr-defined]
             await self.set_agent_state_to(AgentState.FINISHED)
+        elif isinstance(event, AgentRejectAction):
+            self.state.outputs = event.outputs  # type: ignore[attr-defined]
+            await self.set_agent_state_to(AgentState.REJECTED)
         elif isinstance(event, Observation):
             if self._pending_action and self._pending_action.id == event.cause:
                 await self.add_history(self._pending_action, event)
@@ -252,7 +256,7 @@ async def _step(self):
                 # propagate error state until an agent or user can handle it
                 await self.set_agent_state_to(AgentState.ERROR)
                 return
-            delegate_done = delegate_state == AgentState.FINISHED
+            delegate_done = delegate_state in (AgentState.FINISHED, AgentState.REJECTED)
             if delegate_done:
                 logger.info(
                     f'[Agent Controller {self.id}] Delegate agent has finished execution'
 
@@ -127,6 +127,7 @@ async def on_event(event: Event):
     event_stream.subscribe(EventStreamSubscriber.MAIN, on_event)
     while controller.get_agent_state() not in [
         AgentState.FINISHED,
+        AgentState.REJECTED,
         AgentState.ERROR,
         AgentState.PAUSED,
         AgentState.STOPPED,
 
@@ -30,6 +30,10 @@ class AgentState(str, Enum):
     """The agent is finished with the current task.
     """
 
+    REJECTED = 'rejected'
+    """The agent rejects the task.
+    """
+
     ERROR = 'error'
     """An error occurred during the task.
     """
@@ -55,6 +55,15 @@ TEST_ONLY=true ./tests/integration/regenerate.sh
 
 to run all integration tests until the first failure.
 
+If you only want to run a specific test, set environment variable
+`ONLY_TEST_NAME` to the test name. If you only want to run a specific agent,
+set environment variable `ONLY_TEST_AGENT` to the agent. You could also use both,
+e.g.
+
+```bash
+TEST_ONLY=true ONLY_TEST_NAME="test_simple_task_rejection" ONLY_TEST_AGENT="ManagerAgent" ./tests/integration/regenerate.sh
+```
+
 
 ## Regenerate Integration Tests
 When you make changes to an agent's prompt, the integration tests will fail. You'll need to regenerate them
 
@@ -1,9 +1,9 @@
 import io
 import os
 import re
+import subprocess
 import sys
 import tempfile
-import subprocess
 from functools import partial
 from http.server import HTTPServer, SimpleHTTPRequestHandler
 from threading import Thread
@@ -18,7 +18,8 @@
 
 
 def filter_out_symbols(input):
-    return ' '.join([char for char in input if char.isalnum()])
+    input = re.sub(r'\\n|\\r\\n|\\r|\s+', '', input)
+    return input
 
 
 def get_log_id(prompt_log_name):
@@ -84,13 +85,19 @@ def get_mock_response(test_name: str, messages: str, id: int) -> str:
             print('Mismatched Prompt File path', prompt_file_path)
             print('---' * 10)
             # Create a temporary file to store messages
-            with tempfile.NamedTemporaryFile(delete=False, mode='w', encoding='utf-8') as tmp_file:
+            with tempfile.NamedTemporaryFile(
+                delete=False, mode='w', encoding='utf-8'
+            ) as tmp_file:
                 tmp_file_path = tmp_file.name
                 tmp_file.write(messages)
 
             try:
                 # Use diff command to compare files and capture the output
-                result = subprocess.run(['diff', '-u', prompt_file_path, tmp_file_path], capture_output=True, text=True)
+                result = subprocess.run(
+                    ['diff', '-u', prompt_file_path, tmp_file_path],
+                    capture_output=True,
+                    text=True,
+                )
                 if result.returncode != 0:
                     print('Diff:')
                     print(result.stdout)
 
@@ -11,6 +11,15 @@ can do the actual work. A description of each agent is provided below. You MUST
 select one of the delegates below to move towards accomplishing the task, and you MUST
 provide the correct inputs for the delegate you select.
 
+Note: the delegated agent either returns "finish" or "reject".
+- If the action is "finish", but the full task is not done yet, you should
+continue to delegate to one of the agents below to until the full task is finished.
+- If the action is "reject", it means the delegated agent is not capable of the
+task you send to. You should revisit the input you send to the delegate, and consider
+whether any other delegate would be able to solve the task. If you cannot find
+a proper delegate agent, or the delegate attempts keep failing, call the `reject`
+action.
+
 ## Agents
 
 ### CoderAgent
@@ -62,6 +71,9 @@ They are time-ordered, with your most recent action at the bottom.
 
 [[{"source": "user", "action": "message", "args": {"content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]
 
+If the last item in the history is an error, you should try to fix it. If you
+cannot fix it, call the `reject` action.
+
 ## Available Actions
 * `delegate` - send a task to another agent from the list provided. Arguments:
   * `agent` - the agent to which the task is delegated. MUST match a name in the list of agents provided.
@@ -70,6 +82,9 @@ They are time-ordered, with your most recent action at the bottom.
 * `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
   * `outputs` - a dictionary representing the outputs of your task, if any
 
+* `reject` - reject the task. Arguments:
+  * `outputs` - a dictionary with only a `reason` attribute
+
 
 ## Format
 Your response MUST be in JSON format. It must be an object, and it must contain two fields:
 
@@ -52,7 +52,7 @@ as well as observations you've made. This only includes the MOST RECENT
 actions and observations--more may have happened before that.
 They are time-ordered, with your most recent action at the bottom.
 
-[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]
+[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}]]
 
 ## Format
 Your response MUST be in JSON format. It must be an object, and it must contain two fields:
 
@@ -52,7 +52,7 @@ as well as observations you've made. This only includes the MOST RECENT
 actions and observations--more may have happened before that.
 They are time-ordered, with your most recent action at the bottom.
 
-[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "write", "args": {"path": "./bad.txt", "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "write", "content": "", "extras": {"path": "./bad.txt"}}]]
+[[{"source": "agent", "action": "read", "args": {"path": "./bad.txt", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "read", "content": "This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!\n", "extras": {"path": "./bad.txt"}}], [{"source": "agent", "action": "message", "args": {"content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "write", "args": {"path": "./bad.txt", "content": "This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!\n", "start": 0, "end": -1, "thought": ""}}, {"source": "agent", "observation": "write", "content": "", "extras": {"path": "./bad.txt"}}]]
 
 ## Format
 Your response MUST be in JSON format. It must be an object, and it must contain two fields:
 
@@ -11,6 +11,15 @@ can do the actual work. A description of each agent is provided below. You MUST
 select one of the delegates below to move towards accomplishing the task, and you MUST
 provide the correct inputs for the delegate you select.
 
+Note: the delegated agent either returns "finish" or "reject".
+- If the action is "finish", but the full task is not done yet, you should
+continue to delegate to one of the agents below to until the full task is finished.
+- If the action is "reject", it means the delegated agent is not capable of the
+task you send to. You should revisit the input you send to the delegate, and consider
+whether any other delegate would be able to solve the task. If you cannot find
+a proper delegate agent, or the delegate attempts keep failing, call the `reject`
+action.
+
 ## Agents
 
 ### CoderAgent
@@ -60,7 +69,10 @@ as well as observations you've made. This only includes the MOST RECENT
 actions and observations--more may have happened before that.
 They are time-ordered, with your most recent action at the bottom.
 
-[[{"source": "user", "action": "message", "args": {"content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "delegate", "args": {"agent": "TypoFixerAgent", "inputs": {"task": "Fix typos in bad.txt"}, "thought": ""}}, {"observation": "null", "content": "", "extras": {}}], [{"action": "null", "args": {}}, {"source": "agent", "observation": "delegate", "content": "", "extras": {"outputs": {"summary": {"file": "./bad.txt", "typos_fixed": [{"original": "typoo", "fixed": "typo"}, {"original": "mor", "fixed": "more"}]}}}}]]
+[[{"source": "user", "action": "message", "args": {"content": "Fix typos in bad.txt. Do not ask me for confirmation at any point.", "wait_for_response": false}}, {"observation": "null", "content": "", "extras": {}}], [{"source": "agent", "action": "delegate", "args": {"agent": "TypoFixerAgent", "inputs": {"task": "Fix typos in bad.txt"}, "thought": ""}}, {"observation": "null", "content": "", "extras": {}}], [{"action": "null", "args": {}}, {"source": "agent", "observation": "delegate", "content": "", "extras": {"outputs": {"summary": {"file": "bad.txt", "typos_fixed": [{"original": "typoo", "corrected": "typo"}, {"original": "mor", "corrected": "more"}]}}}}]]
+
+If the last item in the history is an error, you should try to fix it. If you
+cannot fix it, call the `reject` action.
 
 ## Available Actions
 * `delegate` - send a task to another agent from the list provided. Arguments:
@@ -70,6 +82,9 @@ They are time-ordered, with your most recent action at the bottom.
 * `finish` - if you're absolutely certain that you've completed your task, use the finish action to stop working. Arguments:
   * `outputs` - a dictionary representing the outputs of your task, if any
 
+* `reject` - reject the task. Arguments:
+  * `outputs` - a dictionary with only a `reason` attribute
+
 
 ## Format
 Your response MUST be in JSON format. It must be an object, and it must contain two fields:
 
@@ -1,7 +1,7 @@
 {
   "action": "message",
   "args": {
-    "content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.",
+    "content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.",
     "wait_for_response": false
   }
 }
@@ -3,15 +3,15 @@
   "args": {
     "outputs": {
       "summary": {
-        "file": "./bad.txt",
+        "file": "bad.txt",
         "typos_fixed": [
           {
             "original": "typoo",
-            "fixed": "typo"
+            "corrected": "typo"
           },
           {
             "original": "mor",
-            "fixed": "more"
+            "corrected": "more"
           }
         ]
       }
 
@@ -3,15 +3,15 @@
   "action": "finish",
   "args": {
     "outputs": {
-      "file": "./bad.txt",
+      "file": "bad.txt",
       "typos_fixed": [
         {
           "original": "typoo",
-          "fixed": "typo"
+          "corrected": "typo"
         },
         {
           "original": "mor",
-          "fixed": "more"
+          "corrected": "more"
         }
       ]
     }
Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,2 @@`
`1`	`1`	* `reject` - reject the task. Arguments:
`2`		- * `outputs` - a dictionary representing the outputs of your task, if any
	`2`	+ * `outputs` - a dictionary with only a `reason` attribute
Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,7 @@`
`1`	`1`	`{`
`2`	`2`	`"action": "message",`
`3`	`3`	`"args": {`
`4`		`- "content": "The content of 'bad.txt' has been read. The next step is to identify and fix the typos.",`
	`4`	`+ "content": "The content of 'bad.txt' has been read. The following typos have been identified:\n1. 'typoo' should be 'typo'\n2. 'mor' should be 'more'\nI will now proceed to fix these typos and overwrite the file with the corrected content.",`
`5`	`5`	`"wait_for_response": false`
`6`	`6`	`}`
`7`	`7`	`}`
Original file line number	Diff line number	Diff line change
`@@ -3,15 +3,15 @@`
`3`	`3`	`"args": {`
`4`	`4`	`"outputs": {`
`5`	`5`	`"summary": {`
`6`		`- "file": "./bad.txt",`
	`6`	`+ "file": "bad.txt",`
`7`	`7`	`"typos_fixed": [`
`8`	`8`	`{`
`9`	`9`	`"original": "typoo",`
`10`		`- "fixed": "typo"`
	`10`	`+ "corrected": "typo"`
`11`	`11`	`},`
`12`	`12`	`{`
`13`	`13`	`"original": "mor",`
`14`		`- "fixed": "more"`
	`14`	`+ "corrected": "more"`
`15`	`15`	`}`
`16`	`16`	`]`
`17`	`17`	`}`
Original file line number	Diff line number	Diff line change
`@@ -3,15 +3,15 @@`
`3`	`3`	`"action": "finish",`
`4`	`4`	`"args": {`
`5`	`5`	`"outputs": {`
`6`		`- "file": "./bad.txt",`
	`6`	`+ "file": "bad.txt",`
`7`	`7`	`"typos_fixed": [`
`8`	`8`	`{`
`9`	`9`	`"original": "typoo",`
`10`		`- "fixed": "typo"`
	`10`	`+ "corrected": "typo"`
`11`	`11`	`},`
`12`	`12`	`{`
`13`	`13`	`"original": "mor",`
`14`		`- "fixed": "more"`
	`14`	`+ "corrected": "more"`
`15`	`15`	`}`
`16`	`16`	`]`
`17`	`17`	`}`