Fix parallel final answers #1482

aymeric-roucher · 2025-06-24T14:41:30Z

Along the way:

improved tool calling management to handle call ids and show observations in call_id order
added the streaming of ToolCall objets
added systematic checking of input types to raise AgentToolCallError

aymeric-roucher · 2025-06-24T14:54:41Z

src/smolagents/agents.py

+            else:
+                final_answer = answers
+
+            # TODO: Make sure this is not a problem: https://github.com/huggingface/smolagents/pull/1255#pullrequestreview-2822036066


@albertvillanova WDYT? In #1255 (review), you had fixed an issue with return types by skipping the execution of final answer tool if the output was not dict or string.
But certain custom finalAnswerTool implementations might very well expect an image or audio to perform post processing on them, thus I think it's better to keep this. Now tests do pass, please make sure none of the usages you were thinking of has been broken by my changes.

Yes, reviewing my previous modification, I realize maybe I went a bit too fast: the actual issue was with the error message indeed! Your fix using str instead of json.loads makes sense and addresses it properly.

Thanks for catching this and ensuring compatibility with broader FinalAnswerTool implementations. I've re-checked the relevant usages, and nothing appears to be broken by your changes.

aymeric-roucher · 2025-06-24T14:55:19Z

src/smolagents/agents.py

@@ -1470,12 +1458,12 @@ def execute_tool_call(self, tool_name: str, arguments: dict[str, str] | str) ->
            # Handle execution errors
            if is_managed_agent:
                error_msg = (
-                    f"Error executing request to team member '{tool_name}' with arguments {json.dumps(arguments)}: {e}\n"
+                    f"Error executing request to team member '{tool_name}' with arguments {str(arguments)}: {e}\n"


Changing to str to not raise issue in cases the arguments are not serializable, e.g. PIL Images or audio.

That was the issue I found. Good solution.

Maybe we could improve the __str__ methods for AgentImage and AgentAudio, so we avoid having their memory address in the message error. Currently, we would get:

Error executing tool 'final_answer' with arguments {'answer': <smolagents.agent_types.AgentImage image mode= size=0x0 at 0x7EFCDD78E190>}: FileNotFoundError: [Errno 2] No such file or directory: 'path.png'
Please try again or use another tool

aymeric-roucher · 2025-06-24T14:55:32Z

tests/test_agents.py

        answer = agent.run("Fake task.")
-        assert answer == "1CUSTOM2"
+        assert answer == ["1 and 2", "3 and 4"]


This tests that two final answer tool calls can run at once.

aymeric-roucher · 2025-06-24T16:30:20Z

src/smolagents/agents.py

-            tool_arguments = tool_call.function.arguments
-            model_outputs.append(str(f"Called Tool: '{tool_name}' with arguments: {tool_arguments}"))
-            tool_calls.append(ToolCall(name=tool_name, arguments=tool_arguments, id=tool_call.id))
-            # Track final_answer separately, add others to parallel processing list


This is not needed anymore: in short, now that we have a ToolOutput object logging if the tool output is a final answer, we can just yield ToolOutput objects then process possible final answers in step_stream, which makes more sense to process at the step level.

aymeric-roucher · 2025-06-24T16:30:40Z

src/smolagents/agents.py

-                        yield ToolOutput(output=None, is_final_answer=False)
-
-        # Process final_answer call if present
-        if final_answer_call:


No need to split the logic: we process all tools the same way.

aymeric-roucher · 2025-06-24T16:31:41Z

tests/test_monitoring.py

        )

        # Use stream_to_gradio to capture the output
        outputs = list(
            stream_to_gradio(
                agent,
                task="Test task",
-                additional_args=dict(image=AgentImage(value="path.png")),
+                additional_args=dict(image=PIL.Image.new("RGB", (100, 100))),


Since we now always run the final_answer tool, thus was trying to create an AgentImage from this image where the path is fake, thus raising an error: instead I create a new PIL image (which matches the real use case).

albertvillanova

Thanks! Good fix and refactoring!

albertvillanova · 2025-06-25T05:45:17Z

examples/gradio_ui.py

+from smolagents import GradioUI, InferenceClientModel, ToolCallingAgent, WebSearchTool


-agent = CodeAgent(
+agent = ToolCallingAgent(


Why this change?

This is just for testing, will rever before merge !

albertvillanova · 2025-06-25T05:49:50Z

src/smolagents/agents.py

                    title="Output message of the LLM:",
                    level=LogLevel.DEBUG,
                )

            # Record model output
            memory_step.model_output_message = chat_message
-            memory_step.model_output = chat_message.content
+            memory_step.model_output = str(chat_message.content)


Note that chat_message.content might be None, so memory_step.model_output would be "None"` in that case. Is this expected?

albertvillanova · 2025-06-25T06:28:27Z

src/smolagents/agents.py

+            else:
+                final_answer = answers
+
+            # TODO: Make sure this is not a problem: https://github.com/huggingface/smolagents/pull/1255#pullrequestreview-2822036066


Yes, reviewing my previous modification, I realize maybe I went a bit too fast: the actual issue was with the error message indeed! Your fix using str instead of json.loads makes sense and addresses it properly.

Thanks for catching this and ensuring compatibility with broader FinalAnswerTool implementations. I've re-checked the relevant usages, and nothing appears to be broken by your changes.

albertvillanova · 2025-06-25T06:34:51Z

src/smolagents/agents.py

@@ -1470,12 +1458,12 @@ def execute_tool_call(self, tool_name: str, arguments: dict[str, str] | str) ->
            # Handle execution errors
            if is_managed_agent:
                error_msg = (
-                    f"Error executing request to team member '{tool_name}' with arguments {json.dumps(arguments)}: {e}\n"
+                    f"Error executing request to team member '{tool_name}' with arguments {str(arguments)}: {e}\n"


That was the issue I found. Good solution.

Maybe we could improve the __str__ methods for AgentImage and AgentAudio, so we avoid having their memory address in the message error. Currently, we would get:

Error executing tool 'final_answer' with arguments {'answer': <smolagents.agent_types.AgentImage image mode= size=0x0 at 0x7EFCDD78E190>}: FileNotFoundError: [Errno 2] No such file or directory: 'path.png'
Please try again or use another tool

albertvillanova · 2025-06-25T06:42:14Z

src/smolagents/agents.py

+        error_msg = check_tool_arguments(tool, arguments)
+        if error_msg:
+            raise AgentToolCallError(error_msg, self.logger)


What about raising directly the error from check_tool_arguments? And maybe renaming it to validate_tool_arguments?

The error needs to be given argument self.logger, this is why I preferred to keep the error creation within the agent rather than passing around the logger oboject.
Good point, renaming it to validate_tool_arguments!

Well, I think you agree that your proposal creates a somewhat awkward flow: having a function that either returns an error message or None, and relying on the caller to interpret and raise that, doesn’t feel clean or consistent.

I understand the technical constraint you mentioned around needing access to self.logger. In my opinion, this points to a deeper architectural concern, with a too tight coupling between the error generation and the agent internal self.logger.

Perhaps this could be addressed in a future PR to improve separation of concerns: we should be able to decouple this logic.

Agree with you, if you have ideas for decoupling it would be great to implement indeed!

albertvillanova · 2025-06-25T07:08:25Z

src/smolagents/agents.py

+            final_output = list(tool_outputs.values())[0].output
+        else:
+            # ToolCallingAgent can return several final answers in parallel
+            final_output = [tool_outputs[k].output for k in sorted(tool_outputs.keys())]


Just a small observation here: by sorting tool_outputs.keys(), we're effectively ordering the final outputs by tool_call.id (which is what output.id mirrors). This might not always reflect the original order in which the tool calls were made. If the logical or intended sequence depends on the call order rather than the ID's lexicographic order, would it make sense to preserve that instead?

I checked it experimentally: call order may be different from call_id's lexicographic order:

>>> chat_message.tool_calls [ ChatMessageToolCall( function=ChatMessageToolCallFunction( arguments={'query': 'current age of Pope Leo XIV', 'filter_year': 2023}, name='web_search', description=None), id='call_z6H1L4WqXbdfKvibbjZBAcIp', type='function' ), ChatMessageToolCall( function=ChatMessageToolCallFunction( arguments={'query': 'current age of Andrew Garfield', 'filter_year': 2023}, name='web_search', description=None), id='call_NhAIjB8HfSf8RgL8kSuIznrg', type='function' ) ]

>>> list(sorted(tool_outputs.keys())) [ 'call_NhAIjB8HfSf8RgL8kSuIznrg', # Andrew Garfield 'call_z6H1L4WqXbdfKvibbjZBAcIp' # Pope Leo XIV ]

albertvillanova

The call order should be preserved when returning call results.

albertvillanova · 2025-06-25T08:18:01Z

src/smolagents/agents.py

+                    outputs[tool_output.id] = tool_output
+                    yield tool_output
+
+        memory_step.tool_calls = [parallel_calls[k] for k in sorted(parallel_calls.keys())]


The same here: you are sorting by lexicographical order, not by call order.

We need a regression test for this.

How would you make sure that tests are consistent, if not ordering by an attribute of the tool call? Assign a specific call_order attribute? If so, is that really better than sorting by call_id?

Just to clarify, I wasn't suggesting introducing or using a custom call order attribute, but preserve the original call order as represented in chat_message.tool_calls.

src/smolagents/agents.py

albertvillanova · 2025-06-25T08:24:33Z

src/smolagents/agents.py

@@ -1738,7 +1703,7 @@ def to_dict(self) -> dict[str, Any]:
        return agent_dict

    @classmethod
-    def from_dict(cls, agent_dict: dict[str, Any], **kwargs) -> "CodeAgent":
+    def from_dict(cls, agent_dict: dict[str, Any], **kwargs) -> "MultiStepAgent":


I think this change should be reverted.

Renaming to MultiStepAgent is less clear on which type of agent this is, but it fixes a linting error, this is why I did it

No strong opinion on this though, I can revert this if you think it's better

While renaming the return type to MultiStepAgent does silence the linting error, it actually masks the underlying issue rather than resolving it:

The root cause of the error is actually in the type hint of the from_dict method in the MultiStepAgent abstract base class

By updating the subclass to return MultiStepAgent, we're introducing a second issue: CodeAgent.from_dict actually returns a CodeAgent instance, not an MultiStepAgent one (indeed, MultiStepAgent is abstract so can't be instantiated).

I'd suggest reverting the change so that CodeAgent.from_dict continues to return a CodeAgent instance. We can then address the root cause (the type hint in MultiStepAgent.from_dict) in a follow-up PR.

Agree, reverting the change!

aymeric-roucher · 2025-06-25T10:18:17Z

@albertvillanova all tool calls are performed in the same ThreadPoolExecutor: so their order in the list of chat_message.tool_calls could be reversed when performing calls: I had the issue that tests wouldn't work because of this. How would you make sure that tests are consistent, if not ordering by an attribute of the tool call? Assign a specific call_order attribute? If so, is that really better than sorting by call_id?

aymeric-roucher · 2025-06-25T10:19:51Z

I've reversed a change that let the agent return several final answers and would aggregate them into a list.
This is because upon further thought, I found that returning several final_answer in parallel did not really makes sense: basically if the framework just accepts them all and aggregates them into the list, it means the decider of the final answer is in the end the framework. when it should rather be the agent.
So I now just raise an error when there are several calls to final_answer in parallel. Then the agent can just take one step to really aggregate its parallel final answers into a single consistent one.

albertvillanova

Thanks for the fix and the all the good refactoring!

The issue with the call order not being preserved has not been addressed (#1482 (review)), but this could be done in a subsequent PR.

Feel free to merge as it is.

aymeric-roucher · 2025-06-27T07:55:02Z

@albertvillanova about call order: I've made an answer here: #1482 (comment). But indeed we can keep handling this in a follow-up PR!

albertvillanova · 2025-06-27T08:00:25Z

@aymeric-roucher and I replied to your answer above here, in the dedicated thread: #1482 (comment)

aymeric-roucher added 3 commits June 24, 2025 16:41

Fix parallel final answers

4630a56

Update tests

4536930

Pass all tests

a36c23d

aymeric-roucher commented Jun 24, 2025

View reviewed changes

aymeric-roucher added 3 commits June 24, 2025 17:56

Refacto, more tests

88fa444

Pass more tests

19b6eac

Pass tests

cb787d3

aymeric-roucher commented Jun 24, 2025

View reviewed changes

aymeric-roucher added 10 commits June 24, 2025 18:45

Restore AgentToolCallError

b067cf5

Try fixing types

0646d80

FIx types

5e515dc

Pass tests

0bdf536

Pass tests

c884f8e

Fix more

03f7b85

Clarify structure

87fe5fe

Remove verbosity_level

576b775

Pass tests

bd9ac4d

Move agent app tempalte to utils

d555e3b

aymeric-roucher marked this pull request as ready for review June 24, 2025 18:36

albertvillanova linked an issue Jun 25, 2025 that may be closed by this pull request

[BUG] Final answer is too greedy in ToolCallingAgent parallel calls #1481

Closed

albertvillanova approved these changes Jun 25, 2025

View reviewed changes

albertvillanova reviewed Jun 25, 2025

View reviewed changes

albertvillanova requested changes Jun 25, 2025

View reviewed changes

albertvillanova reviewed Jun 25, 2025

View reviewed changes

src/smolagents/agents.py Show resolved Hide resolved

albertvillanova reviewed Jun 25, 2025

View reviewed changes

Rename to validate_tool_arguments

8220e9a

aymeric-roucher added 3 commits June 25, 2025 11:57

Add test for validate_tool_arguments

cf53428

Try fixing final answer

20b447a

Pass tests

96aed44

aymeric-roucher mentioned this pull request Jun 26, 2025

feat: Enhanced async tool support for smolagents #1485

Open

aymeric-roucher added 2 commits June 26, 2025 12:10

Revert CodeAgent.from_dict return type hint change

2154fcf

Format

b058d5c

albertvillanova approved these changes Jun 27, 2025

View reviewed changes

Revert gradio UI example

a652569

aymeric-roucher merged commit d6f9708 into main Jun 27, 2025
4 checks passed

albertvillanova mentioned this pull request Jun 27, 2025

Implement CodeOutput as analog to ToolOutput #1496

Merged

Fix parallel final answers #1482

Fix parallel final answers #1482

Uh oh!

Conversation

aymeric-roucher commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aymeric-roucher commented Jun 25, 2025

Uh oh!

aymeric-roucher commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

aymeric-roucher commented Jun 24, 2025 •

edited

Loading

albertvillanova Jun 25, 2025 •

edited

Loading

albertvillanova Jun 25, 2025 •

edited

Loading

aymeric-roucher commented Jun 25, 2025 •

edited

Loading

albertvillanova commented Jun 27, 2025 •

edited

Loading