-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix parallel final answers #1482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parallel final answers #1482
Conversation
src/smolagents/agents.py
Outdated
else: | ||
final_answer = answers | ||
|
||
# TODO: Make sure this is not a problem: https://github.com/huggingface/smolagents/pull/1255#pullrequestreview-2822036066 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertvillanova WDYT? In #1255 (review), you had fixed an issue with return types by skipping the execution of final answer tool if the output was not dict or string.
But certain custom finalAnswerTool implementations might very well expect an image or audio to perform post processing on them, thus I think it's better to keep this. Now tests do pass, please make sure none of the usages you were thinking of has been broken by my changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, reviewing my previous modification, I realize maybe I went a bit too fast: the actual issue was with the error message indeed! Your fix using str
instead of json.loads
makes sense and addresses it properly.
Thanks for catching this and ensuring compatibility with broader FinalAnswerTool
implementations. I've re-checked the relevant usages, and nothing appears to be broken by your changes.
@@ -1470,12 +1458,12 @@ def execute_tool_call(self, tool_name: str, arguments: dict[str, str] | str) -> | |||
# Handle execution errors | |||
if is_managed_agent: | |||
error_msg = ( | |||
f"Error executing request to team member '{tool_name}' with arguments {json.dumps(arguments)}: {e}\n" | |||
f"Error executing request to team member '{tool_name}' with arguments {str(arguments)}: {e}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to str to not raise issue in cases the arguments are not serializable, e.g. PIL Images or audio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the issue I found. Good solution.
Maybe we could improve the __str__
methods for AgentImage and AgentAudio, so we avoid having their memory address in the message error. Currently, we would get:
Error executing tool 'final_answer' with arguments {'answer': <smolagents.agent_types.AgentImage image mode= size=0x0 at 0x7EFCDD78E190>}: FileNotFoundError: [Errno 2] No such file or directory: 'path.png'
Please try again or use another tool
tests/test_agents.py
Outdated
answer = agent.run("Fake task.") | ||
assert answer == "1CUSTOM2" | ||
assert answer == ["1 and 2", "3 and 4"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests that two final answer tool calls can run at once.
tool_arguments = tool_call.function.arguments | ||
model_outputs.append(str(f"Called Tool: '{tool_name}' with arguments: {tool_arguments}")) | ||
tool_calls.append(ToolCall(name=tool_name, arguments=tool_arguments, id=tool_call.id)) | ||
# Track final_answer separately, add others to parallel processing list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed anymore: in short, now that we have a ToolOutput object logging if the tool output is a final answer, we can just yield ToolOutput objects then process possible final answers in step_stream
, which makes more sense to process at the step level.
yield ToolOutput(output=None, is_final_answer=False) | ||
|
||
# Process final_answer call if present | ||
if final_answer_call: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to split the logic: we process all tools the same way.
) | ||
|
||
# Use stream_to_gradio to capture the output | ||
outputs = list( | ||
stream_to_gradio( | ||
agent, | ||
task="Test task", | ||
additional_args=dict(image=AgentImage(value="path.png")), | ||
additional_args=dict(image=PIL.Image.new("RGB", (100, 100))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now always run the final_answer tool, thus was trying to create an AgentImage from this image where the path is fake, thus raising an error: instead I create a new PIL image (which matches the real use case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Good fix and refactoring!
examples/gradio_ui.py
Outdated
from smolagents import GradioUI, InferenceClientModel, ToolCallingAgent, WebSearchTool | ||
|
||
|
||
agent = CodeAgent( | ||
agent = ToolCallingAgent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for testing, will rever before merge !
src/smolagents/agents.py
Outdated
title="Output message of the LLM:", | ||
level=LogLevel.DEBUG, | ||
) | ||
|
||
# Record model output | ||
memory_step.model_output_message = chat_message | ||
memory_step.model_output = chat_message.content | ||
memory_step.model_output = str(chat_message.content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that chat_message.content
might be None
, so memory_step.model_output would be
"None"` in that case. Is this expected?
src/smolagents/agents.py
Outdated
else: | ||
final_answer = answers | ||
|
||
# TODO: Make sure this is not a problem: https://github.com/huggingface/smolagents/pull/1255#pullrequestreview-2822036066 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, reviewing my previous modification, I realize maybe I went a bit too fast: the actual issue was with the error message indeed! Your fix using str
instead of json.loads
makes sense and addresses it properly.
Thanks for catching this and ensuring compatibility with broader FinalAnswerTool
implementations. I've re-checked the relevant usages, and nothing appears to be broken by your changes.
@@ -1470,12 +1458,12 @@ def execute_tool_call(self, tool_name: str, arguments: dict[str, str] | str) -> | |||
# Handle execution errors | |||
if is_managed_agent: | |||
error_msg = ( | |||
f"Error executing request to team member '{tool_name}' with arguments {json.dumps(arguments)}: {e}\n" | |||
f"Error executing request to team member '{tool_name}' with arguments {str(arguments)}: {e}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the issue I found. Good solution.
Maybe we could improve the __str__
methods for AgentImage and AgentAudio, so we avoid having their memory address in the message error. Currently, we would get:
Error executing tool 'final_answer' with arguments {'answer': <smolagents.agent_types.AgentImage image mode= size=0x0 at 0x7EFCDD78E190>}: FileNotFoundError: [Errno 2] No such file or directory: 'path.png'
Please try again or use another tool
src/smolagents/agents.py
Outdated
error_msg = check_tool_arguments(tool, arguments) | ||
if error_msg: | ||
raise AgentToolCallError(error_msg, self.logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about raising directly the error from check_tool_arguments
? And maybe renaming it to validate_tool_arguments
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error needs to be given argument self.logger
, this is why I preferred to keep the error creation within the agent rather than passing around the logger oboject.
Good point, renaming it to validate_tool_arguments
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think you agree that your proposal creates a somewhat awkward flow: having a function that either returns an error message or None
, and relying on the caller to interpret and raise that, doesn’t feel clean or consistent.
I understand the technical constraint you mentioned around needing access to self.logger
. In my opinion, this points to a deeper architectural concern, with a too tight coupling between the error generation and the agent internal self.logger
.
Perhaps this could be addressed in a future PR to improve separation of concerns: we should be able to decouple this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you, if you have ideas for decoupling it would be great to implement indeed!
src/smolagents/agents.py
Outdated
final_output = list(tool_outputs.values())[0].output | ||
else: | ||
# ToolCallingAgent can return several final answers in parallel | ||
final_output = [tool_outputs[k].output for k in sorted(tool_outputs.keys())] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small observation here: by sorting tool_outputs.keys()
, we're effectively ordering the final outputs by tool_call.id
(which is what output.id
mirrors). This might not always reflect the original order in which the tool calls were made. If the logical or intended sequence depends on the call order rather than the ID's lexicographic order, would it make sense to preserve that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked it experimentally: call order may be different from call_id's lexicographic order:
>>> chat_message.tool_calls
[
ChatMessageToolCall(
function=ChatMessageToolCallFunction(
arguments={'query': 'current age of Pope Leo XIV', 'filter_year': 2023}, name='web_search', description=None),
id='call_z6H1L4WqXbdfKvibbjZBAcIp', type='function'
),
ChatMessageToolCall(
function=ChatMessageToolCallFunction(
arguments={'query': 'current age of Andrew Garfield', 'filter_year': 2023}, name='web_search', description=None),
id='call_NhAIjB8HfSf8RgL8kSuIznrg', type='function'
)
]
>>> list(sorted(tool_outputs.keys()))
[
'call_NhAIjB8HfSf8RgL8kSuIznrg', # Andrew Garfield
'call_z6H1L4WqXbdfKvibbjZBAcIp' # Pope Leo XIV
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call order should be preserved when returning call results.
outputs[tool_output.id] = tool_output | ||
yield tool_output | ||
|
||
memory_step.tool_calls = [parallel_calls[k] for k in sorted(parallel_calls.keys())] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here: you are sorting by lexicographical order, not by call order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a regression test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you make sure that tests are consistent, if not ordering by an attribute of the tool call? Assign a specific call_order attribute? If so, is that really better than sorting by call_id?
Just to clarify, I wasn't suggesting introducing or using a custom call order attribute, but preserve the original call order as represented in chat_message.tool_calls
.
src/smolagents/agents.py
Outdated
@@ -1738,7 +1703,7 @@ def to_dict(self) -> dict[str, Any]: | |||
return agent_dict | |||
|
|||
@classmethod | |||
def from_dict(cls, agent_dict: dict[str, Any], **kwargs) -> "CodeAgent": | |||
def from_dict(cls, agent_dict: dict[str, Any], **kwargs) -> "MultiStepAgent": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change should be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming to MultiStepAgent is less clear on which type of agent this is, but it fixes a linting error, this is why I did it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion on this though, I can revert this if you think it's better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While renaming the return type to MultiStepAgent
does silence the linting error, it actually masks the underlying issue rather than resolving it:
- The root cause of the error is actually in the type hint of the
from_dict
method in theMultiStepAgent
abstract base class - By updating the subclass to return
MultiStepAgent
, we're introducing a second issue:CodeAgent.from_dict
actually returns aCodeAgent
instance, not anMultiStepAgent
one (indeed,MultiStepAgent
is abstract so can't be instantiated).
I'd suggest reverting the change so that CodeAgent.from_dict
continues to return a CodeAgent
instance. We can then address the root cause (the type hint in MultiStepAgent.from_dict
) in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, reverting the change!
@albertvillanova all tool calls are performed in the same |
I've reversed a change that let the agent return several final answers and would aggregate them into a list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix and the all the good refactoring!
The issue with the call order not being preserved has not been addressed (#1482 (review)), but this could be done in a subsequent PR.
Feel free to merge as it is.
@albertvillanova about call order: I've made an answer here: #1482 (comment). But indeed we can keep handling this in a follow-up PR! |
@aymeric-roucher and I replied to your answer above here, in the dedicated thread: #1482 (comment) |
Fixing #1481
Along the way:
ToolCall
objetsAgentToolCallError