-
Notifications
You must be signed in to change notification settings - Fork 75
[WIP] Multiaction Support, CUA and refactoring #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…multi-action configuration
…nd added computer_call handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Status |
---|---|---|
Unreachable Conditional Logic ▹ view | ||
Inefficient Token Detail Extraction ▹ view | ||
Insufficient error context ▹ view | ||
Incomplete Multi-key Processing ▹ view | ||
Inefficient Message Type Checking in Loop ▹ view | ||
Missing CUA to Environment Action Conversion ▹ view |
Files scanned
File Path | Reviewed |
---|---|
src/agentlab/agents/tool_use_agent/openai_cua.py | ✅ |
src/agentlab/llm/tracking.py | ✅ |
src/agentlab/agents/tool_use_agent/tool_use_agent.py | ✅ |
src/agentlab/analyze/agent_xray.py | ✅ |
src/agentlab/llm/response_api.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
api_type = 'chatcompletion' if hasattr(usage, "prompt_tokens_details") else 'response' | ||
if api_type == 'chatcompletion': | ||
total_input_tokens = usage.prompt_tokens | ||
output_tokens = usage.completion_tokens | ||
cached_input_tokens = usage.prompt_tokens_details.cached_tokens | ||
non_cached_input_tokens = total_input_tokens - cached_input_tokens | ||
elif api_type == 'response': | ||
total_input_tokens = usage.input_tokens | ||
output_tokens = usage.output_tokens | ||
cached_input_tokens = usage.input_tokens_details.cached_tokens | ||
non_cached_input_tokens = total_input_tokens - cached_input_tokens | ||
else: | ||
logging.warning(f"Unsupported API type: {api_type}. Defaulting cost to 0.0.") | ||
return 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unreachable Conditional Logic 
Tell me more
What is the issue?
Redundant API type check that can never reach the else branch due to the binary ternary operator assignment.
Why this matters
Unnecessary conditional logic and error handling that adds computational overhead with no possible benefit since api_type can only be 'chatcompletion' or 'response'.
Suggested change ∙ Feature Preview
Remove the else branch since it's impossible to reach and simplify the code to:
api_type = 'chatcompletion' if hasattr(usage, "prompt_tokens_details") else 'response'
if api_type == 'chatcompletion':
...
else: # api_type == 'response'
...
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
usage = dict(getattr(response, "usage", {})) | ||
if 'prompt_tokens_details' in usage: | ||
usage['cached_tokens'] = usage['prompt_token_details'].cached_tokens | ||
if 'input_tokens_details' in usage: | ||
usage['cached_tokens'] = usage['input_tokens_details'].cached_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient Token Detail Extraction 
Tell me more
What is the issue?
Dictionary conversion and multiple attribute lookups for cached tokens are performed unnecessarily.
Why this matters
Converting usage to dict and performing multiple lookups creates unnecessary object creation and memory allocation. The second condition overwrites the first one if both are true.
Suggested change ∙ Feature Preview
Optimize the code to avoid dict conversion and prevent redundant assignments:
usage = getattr(response, "usage", {})
cached_tokens = None
if hasattr(usage, "prompt_tokens_details"):
cached_tokens = usage.prompt_tokens_details.cached_tokens
elif hasattr(usage, "input_tokens_details"):
cached_tokens = usage.input_tokens_details.cached_tokens
if cached_tokens is not None:
usage['cached_tokens'] = cached_tokens
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
try: | ||
action_mapping = { | ||
"click": lambda: self._handle_click_action(action), | ||
"scroll": lambda: self._handle_scroll_action(action), | ||
"keypress": lambda: self._handle_keypress_action(action), | ||
"type": lambda: self._handle_type_action(action), | ||
"wait": lambda: self._handle_wait_action(action), | ||
"screenshot": lambda: self._handle_screenshot_action(action), | ||
"drag": lambda: self._handle_drag_action(action), | ||
} | ||
|
||
if action_type in action_mapping: | ||
return action_mapping[action_type]() | ||
else: | ||
raise ValueError(f"Unrecognized openAI CUA action type: {action_type}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insufficient error context 
Tell me more
What is the issue?
The error message for unrecognized action types doesn't include the list of valid actions.
Why this matters
Without knowing the valid options, developers will have to search through code to understand what actions are supported, making debugging and usage more difficult.
Suggested change ∙ Feature Preview
Include valid actions in error message:
valid_actions = list(action_mapping.keys())
if action_type in action_mapping:
return action_mapping[action_type]()
else:
raise ValueError(f"Unrecognized openAI CUA action type: {action_type}. Valid actions are: {valid_actions}")
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
def _handle_keypress_action(self, action): | ||
keys = action.keys | ||
#TODO: Check this if is suitable for BGYM env. | ||
for k in keys: | ||
print(f"Action: keypress '{k}'") | ||
if k.lower() == "enter": | ||
key = "Enter" | ||
elif k.lower() == "space": | ||
key = " " | ||
return "keyboard_press", {"key": key} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete Multi-key Processing 
Tell me more
What is the issue?
The keypress handler returns inside the for loop after processing only the first key, ignoring all subsequent keys in the action.keys list.
Why this matters
This will cause multi-key combinations (like Ctrl+C) to fail as only the first key will be processed.
Suggested change ∙ Feature Preview
def _handle_keypress_action(self, action):
keys = action.keys
processed_keys = []
for k in keys:
if k.lower() == "enter":
processed_keys.append("Enter")
elif k.lower() == "space":
processed_keys.append(" ")
else:
processed_keys.append(k)
return "keyboard_press", {"keys": processed_keys}
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
if not isinstance(messages[i], ToolCalls): | ||
messages[i].mark_all_previous_msg_for_caching() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient Message Type Checking in Loop 
Tell me more
What is the issue?
Type checking in a loop to determine caching behavior creates unnecessary overhead for each message iteration.
Why this matters
Frequent type checking in loops can impact performance, especially with large message histories. The caching logic could be optimized to avoid repeated type checks.
Suggested change ∙ Feature Preview
Consider restructuring the caching logic to handle message types more efficiently. One approach is to separate ToolCalls messages earlier:
if messages and not isinstance(messages[-1], ToolCalls):
messages[-1].mark_all_previous_msg_for_caching()
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
def cua_action_to_env_tool_name_and_args(self, action: str) -> tuple[str, Dict[str, Any]]: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing CUA to Environment Action Conversion 
Tell me more
What is the issue?
Empty function implementation for handling Computer Use Actions (CUA) to environment tool name and arguments conversion.
Why this matters
Will cause system failure when OpenAI returns a computer_call action type since the conversion functionality is not implemented.
Suggested change ∙ Feature Preview
Implement the CUA conversion function to properly handle OpenAI's Computer Use Actions format:
def cua_action_to_env_tool_name_and_args(self, action: str) -> tuple[str, Dict[str, Any]]:
# Example implementation - adapt based on actual CUA format
action_parts = action.split('(', 1)
action_name = action_parts[0].strip()
args_str = action_parts[1].rstrip(')')
# Parse arguments string to dictionary
args_dict = {}
if args_str:
for arg in args_str.split(','):
key, value = arg.split('=', 1)
args_dict[key.strip()] = value.strip().strip('"')
return action_name, args_dict
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
This pull request introduces a new
OpenAICUAModel
for handling OpenAI's Computer Use Agent (CUA) functionality, along with several updates to improve tool call handling, multi-action support, and cost calculation. The changes span across multiple files, enhancing the agent's capabilities and refining its behavior.New OpenAICUAModel Implementation:
OpenAICUAModel
class insrc/agentlab/agents/tool_use_agent/openai_cua.py
to support OpenAI's CUA, including methods for API calls and handling various computer actions like clicks, scrolls, and drags. This also includes a default configuration for CUA-specific behavior.Enhancements to Tool Use Agent:
contains_image
method intool_use_agent.py
to detect the presence of images in groups, and updated theapply
method in theObs
class to handle OpenAI CUA mode-specific behavior. [1] [2]Goal
class to allow custom system messages, enabling dynamic behavior based on configuration.get_action
method, allowing multiple actions in a single step if configured. [1] [2]WIP Config Classes:
Testing Improvements:
test_response_api.py
to align with the new tool call handling and action representation, ensuring robust validation of the changes. [1] [2] [3]Refinements to Cost Calculation:
tracking.py
to handle both chat completion and response APIs, ensuring accurate token usage and caching details. [1] [2]Testing Improvements:
test_response_api.py
to align with the new tool call handling and action representation, ensuring robust validation of the changes. [1] [2] [3]Minor Updates:
ToolCalls
import to relevant files for improved tool call handling. [1] [2]ToolCall
class intool_use_agent.py
by commenting out its implementation.These changes collectively enhance the agent's functionality and adaptability, particularly for OpenAI CUA scenarios, while improving maintainability and test coverage.
Description by Korbit AI
What change is being made?
Add multiaction support for the Computer Use Agent (CUA), integrate OpenAI API enhancements, and refactor tool call handling alongside configuration updates.
Why are these changes being made?
The addition of multiaction capabilities enhances agent functionality by allowing multiple actions per step, aligning with more complex real-world scenarios. The integration of OpenAI API enrichments introduces robust handling and conversion of tool calls, while refactoring improves maintainability, code clarity, and positioning for future feature expansions.