[WIP] Multiaction Support, CUA and refactoring #257

amanjaiswal73892 · 2025-06-30T18:09:22Z

This pull request introduces a new OpenAICUAModel for handling OpenAI's Computer Use Agent (CUA) functionality, along with several updates to improve tool call handling, multi-action support, and cost calculation. The changes span across multiple files, enhancing the agent's capabilities and refining its behavior.

New OpenAICUAModel Implementation:

Added OpenAICUAModel class in src/agentlab/agents/tool_use_agent/openai_cua.py to support OpenAI's CUA, including methods for API calls and handling various computer actions like clicks, scrolls, and drags. This also includes a default configuration for CUA-specific behavior.

Enhancements to Tool Use Agent:

Introduced contains_image method in tool_use_agent.py to detect the presence of images in groups, and updated the apply method in the Obs class to handle OpenAI CUA mode-specific behavior. [1] [2]
Modified the Goal class to allow custom system messages, enabling dynamic behavior based on configuration.
Added multi-action support in the get_action method, allowing multiple actions in a single step if configured. [1] [2]

WIP Config Classes:

Added WIP config classes for APIPayload config for llm calls and other config management.

Testing Improvements:

Adjusted test cases in test_response_api.py to align with the new tool call handling and action representation, ensuring robust validation of the changes. [1] [2] [3]
WIP Tests need to be updated for multiactions representation which is independent of the environment.

Refinements to Cost Calculation:

Updated cost calculation logic in tracking.py to handle both chat completion and response APIs, ensuring accurate token usage and caching details. [1] [2]

Testing Improvements:

Adjusted test cases in test_response_api.py to align with the new tool call handling and action representation, ensuring robust validation of the changes. [1] [2] [3]

Minor Updates:

Added ToolCalls import to relevant files for improved tool call handling. [1] [2]
Deprecated the ToolCall class in tool_use_agent.py by commenting out its implementation.

These changes collectively enhance the agent's functionality and adaptability, particularly for OpenAI CUA scenarios, while improving maintainability and test coverage.

Description by Korbit AI

What change is being made?

Add multiaction support for the Computer Use Agent (CUA), integrate OpenAI API enhancements, and refactor tool call handling alongside configuration updates.

Why are these changes being made?

The addition of multiaction capabilities enhances agent functionality by allowing multiple actions per step, aligning with more complex real-world scenarios. The integration of OpenAI API enrichments introduces robust handling and conversion of tool calls, while refactoring improves maintainability, code clarity, and positioning for future feature expansions.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

…multi-action configuration

…rophic

…ponses API

…nd added computer_call handling.

…e organization

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Unreachable Conditional Logic ▹ view
	Inefficient Token Detail Extraction ▹ view
	Insufficient error context ▹ view
	Incomplete Multi-key Processing ▹ view
	Inefficient Message Type Checking in Loop ▹ view
	Missing CUA to Environment Action Conversion ▹ view

Files scanned

File Path	Reviewed
src/agentlab/agents/tool_use_agent/openai_cua.py	✅
src/agentlab/llm/tracking.py	✅
src/agentlab/agents/tool_use_agent/tool_use_agent.py	✅
src/agentlab/analyze/agent_xray.py	✅
src/agentlab/llm/response_api.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-06-30T18:15:32Z

src/agentlab/llm/tracking.py

+        api_type = 'chatcompletion' if hasattr(usage, "prompt_tokens_details") else 'response'
+        if api_type == 'chatcompletion':
+            total_input_tokens = usage.prompt_tokens
+            output_tokens = usage.completion_tokens
+            cached_input_tokens = usage.prompt_tokens_details.cached_tokens
+            non_cached_input_tokens = total_input_tokens - cached_input_tokens
+        elif api_type == 'response':
+            total_input_tokens = usage.input_tokens
+            output_tokens = usage.output_tokens
+            cached_input_tokens = usage.input_tokens_details.cached_tokens
+            non_cached_input_tokens = total_input_tokens - cached_input_tokens
+        else:
+            logging.warning(f"Unsupported API type: {api_type}. Defaulting cost to 0.0.")
+            return 0.0


Unreachable Conditional Logic

Tell me more

What is the issue?

Redundant API type check that can never reach the else branch due to the binary ternary operator assignment.

Why this matters

Unnecessary conditional logic and error handling that adds computational overhead with no possible benefit since api_type can only be 'chatcompletion' or 'response'.

Suggested change ∙ Feature Preview

Remove the else branch since it's impossible to reach and simplify the code to:

api_type = 'chatcompletion' if hasattr(usage, "prompt_tokens_details") else 'response' if api_type == 'chatcompletion': ... else: # api_type == 'response' ...

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-06-30T18:15:32Z

src/agentlab/llm/tracking.py

        usage = dict(getattr(response, "usage", {}))
+        if 'prompt_tokens_details' in usage:
+            usage['cached_tokens'] = usage['prompt_token_details'].cached_tokens
+        if 'input_tokens_details' in usage:
+            usage['cached_tokens'] = usage['input_tokens_details'].cached_tokens


Inefficient Token Detail Extraction

Tell me more

What is the issue?

Dictionary conversion and multiple attribute lookups for cached tokens are performed unnecessarily.

Why this matters

Converting usage to dict and performing multiple lookups creates unnecessary object creation and memory allocation. The second condition overwrites the first one if both are true.

Suggested change ∙ Feature Preview

Optimize the code to avoid dict conversion and prevent redundant assignments:

usage = getattr(response, "usage", {}) cached_tokens = None if hasattr(usage, "prompt_tokens_details"): cached_tokens = usage.prompt_tokens_details.cached_tokens elif hasattr(usage, "input_tokens_details"): cached_tokens = usage.input_tokens_details.cached_tokens if cached_tokens is not None: usage['cached_tokens'] = cached_tokens

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-06-30T18:15:32Z

src/agentlab/agents/tool_use_agent/openai_cua.py

+        try:
+            action_mapping = {
+                "click": lambda: self._handle_click_action(action),
+                "scroll": lambda: self._handle_scroll_action(action),
+                "keypress": lambda: self._handle_keypress_action(action),
+                "type": lambda: self._handle_type_action(action),
+                "wait": lambda: self._handle_wait_action(action),
+                "screenshot": lambda: self._handle_screenshot_action(action),
+                "drag": lambda: self._handle_drag_action(action),
+            }
+
+            if action_type in action_mapping:
+                return action_mapping[action_type]()
+            else:
+                raise ValueError(f"Unrecognized openAI CUA action type: {action_type}")


Insufficient error context

Tell me more

What is the issue?

The error message for unrecognized action types doesn't include the list of valid actions.

Why this matters

Without knowing the valid options, developers will have to search through code to understand what actions are supported, making debugging and usage more difficult.

Suggested change ∙ Feature Preview

Include valid actions in error message:

valid_actions = list(action_mapping.keys()) if action_type in action_mapping: return action_mapping[action_type]() else: raise ValueError(f"Unrecognized openAI CUA action type: {action_type}. Valid actions are: {valid_actions}")

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-06-30T18:15:32Z

src/agentlab/agents/tool_use_agent/openai_cua.py

+    def _handle_keypress_action(self, action):
+        keys = action.keys
+        #TODO: Check this if is suitable for BGYM env.
+        for k in keys:
+            print(f"Action: keypress '{k}'")
+            if k.lower() == "enter":
+                key = "Enter"
+            elif k.lower() == "space":
+                key = " "
+            return "keyboard_press", {"key": key}


Incomplete Multi-key Processing

Tell me more

What is the issue?

The keypress handler returns inside the for loop after processing only the first key, ignoring all subsequent keys in the action.keys list.

Why this matters

This will cause multi-key combinations (like Ctrl+C) to fail as only the first key will be processed.

Suggested change ∙ Feature Preview

def _handle_keypress_action(self, action): keys = action.keys processed_keys = [] for k in keys: if k.lower() == "enter": processed_keys.append("Enter") elif k.lower() == "space": processed_keys.append(" ") else: processed_keys.append(k) return "keyboard_press", {"keys": processed_keys}

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-06-30T18:15:33Z

src/agentlab/agents/tool_use_agent/tool_use_agent.py

+                if not isinstance(messages[i], ToolCalls):
+                    messages[i].mark_all_previous_msg_for_caching()


Inefficient Message Type Checking in Loop

Tell me more

What is the issue?

Type checking in a loop to determine caching behavior creates unnecessary overhead for each message iteration.

Why this matters

Frequent type checking in loops can impact performance, especially with large message histories. The caching logic could be optimized to avoid repeated type checks.

Suggested change ∙ Feature Preview

Consider restructuring the caching logic to handle message types more efficiently. One approach is to separate ToolCalls messages earlier:

if messages and not isinstance(messages[-1], ToolCalls): messages[-1].mark_all_previous_msg_for_caching()

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-06-30T18:15:33Z

src/agentlab/llm/response_api.py

+    def cua_action_to_env_tool_name_and_args(self, action: str) -> tuple[str, Dict[str, Any]]:
+        pass


Missing CUA to Environment Action Conversion

Tell me more

What is the issue?

Empty function implementation for handling Computer Use Actions (CUA) to environment tool name and arguments conversion.

Why this matters

Will cause system failure when OpenAI returns a computer_call action type since the conversion functionality is not implemented.

Suggested change ∙ Feature Preview

Implement the CUA conversion function to properly handle OpenAI's Computer Use Actions format:

def cua_action_to_env_tool_name_and_args(self, action: str) -> tuple[str, Dict[str, Any]]: # Example implementation - adapt based on actual CUA format action_parts = action.split('(', 1) action_name = action_parts[0].strip() args_str = action_parts[1].rstrip(')') # Parse arguments string to dictionary args_dict = {} if args_str: for arg in args_str.split(','): key, value = arg.split('=', 1) args_dict[key.strip()] = value.strip().strip('"') return action_name, args_dict

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

recursix and others added 14 commits June 20, 2025 09:03

Enhance goal application to support dynamic system messages based on …

f57a919

…multi-action configuration

Merge branch 'main' into allac/next-agent

3a21dc7

Modify API for parallel tool calls add parallel tool support for anth…

0b9a985

…rophic

enable screenshot tagging in xray for mutli-actions

89376ab

add tool call and tool response markdown in xray

81b1a88

update tests for parallel tool calls in claude for responses_api.py

c244db3

Add support for openai CUA for tool use agent

e7613d6

update openai cache tracking usage to support Chat Completion and Res…

b4c283f

…ponses API

Merge remote-tracking branch 'origin' into allac/next-agent

c15331a

fix: join action_list into a single string with new-lines

00da78a

Refactor OpenAiResponsesModel to isolate env specific functionality a…

1d7060c

…nd added computer_call handling.

Change func names to use 'env' instead of 'bgym'

997cc7b

Add TODO's and WIP config classes

b3d409f

Refactor OpenAICUAModel to streamline action handling and improve cod…

f5443b6

…e organization

amanjaiswal73892 requested a review from recursix June 30, 2025 18:09

amanjaiswal73892 marked this pull request as draft June 30, 2025 18:10

korbit-ai bot reviewed Jun 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Multiaction Support, CUA and refactoring #257

[WIP] Multiaction Support, CUA and refactoring #257

Uh oh!

amanjaiswal73892 commented Jun 30, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

korbit-ai bot Jun 30, 2025

Uh oh!

korbit-ai bot Jun 30, 2025

Uh oh!

korbit-ai bot Jun 30, 2025

Uh oh!

korbit-ai bot Jun 30, 2025

Uh oh!

korbit-ai bot Jun 30, 2025

Uh oh!

korbit-ai bot Jun 30, 2025

Uh oh!

Uh oh!

		if not isinstance(messages[i], ToolCalls):
		messages[i].mark_all_previous_msg_for_caching()

		def cua_action_to_env_tool_name_and_args(self, action: str) -> tuple[str, Dict[str, Any]]:
		pass

[WIP] Multiaction Support, CUA and refactoring #257

Are you sure you want to change the base?

[WIP] Multiaction Support, CUA and refactoring #257

Uh oh!

Conversation

amanjaiswal73892 commented Jun 30, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New OpenAICUAModel Implementation:

Enhancements to Tool Use Agent:

WIP Config Classes:

Testing Improvements:

Refinements to Cost Calculation:

Testing Improvements:

Minor Updates:

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

korbit-ai bot Jun 30, 2025

Choose a reason for hiding this comment

Unreachable Conditional Logic

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Jun 30, 2025

Choose a reason for hiding this comment

Inefficient Token Detail Extraction

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Jun 30, 2025

Choose a reason for hiding this comment

Insufficient error context

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Jun 30, 2025

Choose a reason for hiding this comment

Incomplete Multi-key Processing

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Jun 30, 2025

Choose a reason for hiding this comment

Inefficient Message Type Checking in Loop

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Jun 30, 2025

Choose a reason for hiding this comment

Missing CUA to Environment Action Conversion

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

Uh oh!

amanjaiswal73892 commented Jun 30, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading