Skip to content

excluding circuit breaker for Agent #3814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 30, 2025

Conversation

dhrubo-os
Copy link
Collaborator

Description

[excluding circuit breaker for Agent]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -112,8 +112,7 @@ public void dispatchTask(
if (clusterService.localNode().getId().equals(nodeId)) {
// Execute ML task locally
log.debug("Execute ML request {} locally on node {}", request.getRequestID(), nodeId);
checkOpenCircuitBreaker(mlCircuitBreakerService, mlStats);
executeTask(request, listener);
checkCBAndExecute(functionName, request, listener);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Personally I don't like abbreviation. Try to use CircuitBreaker instead of CB?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already an existing method. I'm leaving it as it now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using the latest main branch? I think this change is redundant in the main as checkCBAndExecute is already there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public void run(FunctionName functionName, Request request, TransportService transportService, ActionListener<Response> listener) {
        if (!request.isDispatchTask()) {
            log.debug("Run ML request {} locally", request.getRequestID());
            checkCBAndExecute(functionName, request, listener);
            return;
        }
        dispatchTask(functionName, request, transportService, listener);
    }

This is where we applied.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using latest main branch.

@@ -112,8 +112,7 @@ public void dispatchTask(
if (clusterService.localNode().getId().equals(nodeId)) {
// Execute ML task locally
log.debug("Execute ML request {} locally on node {}", request.getRequestID(), nodeId);
checkOpenCircuitBreaker(mlCircuitBreakerService, mlStats);
executeTask(request, listener);
checkCBAndExecute(functionName, request, listener);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using the latest main branch? I think this change is redundant in the main as checkCBAndExecute is already there?

if (Objects.nonNull(mlTask) && mlTask.getFunctionName() != FunctionName.REMOTE) {

// for agent and remote model prediction we don't need to check circuit breaker
if (Objects.nonNull(mlTask) && mlTask.getFunctionName() != FunctionName.REMOTE && mlTask.getFunctionName() != FunctionName.AGENT) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For agent using pre-trained model, does this skip the CB?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That a good question. It will skip agent level execution. But under the agent execution when a model prediction is going to happen, then function name is not remote or agent, so that time it will check CB.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can merge for now but need to remember that this would cause problems when agent is not using a remote model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For old versions, can user update setting to disable circuit breaker checking ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From code, looks like we only check memory and disk.

So if somebody sets up, jvm_heap_memory_threshold to 100 & disk_free_space_threshold to -1 then circuit breaker will be disabled.

Does that answer your question @ylwu-amzn ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, @dhrubo-os , can you check if we have this in our document?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which document are you referring to? I shared the memory and disk link from docs.opensearch.org.

@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env April 30, 2025 18:31 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env April 30, 2025 18:31 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env April 30, 2025 18:31 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env April 30, 2025 18:31 — with GitHub Actions Inactive
Copy link

codecov bot commented Apr 30, 2025

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 78.00%. Comparing base (ed4f09f) to head (1701c23).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...n/java/org/opensearch/ml/model/MLModelManager.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3814      +/-   ##
============================================
- Coverage     78.20%   78.00%   -0.21%     
- Complexity     7162     7317     +155     
============================================
  Files           631      655      +24     
  Lines         32248    32997     +749     
  Branches       3666     3708      +42     
============================================
+ Hits          25219    25738     +519     
- Misses         5465     5673     +208     
- Partials       1564     1586      +22     
Flag Coverage Δ
ml-commons 78.00% <66.66%> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env April 30, 2025 19:34 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os temporarily deployed to ml-commons-cicd-env April 30, 2025 19:34 — with GitHub Actions Inactive
@dhrubo-os dhrubo-os merged commit f01de7f into opensearch-project:main Apr 30, 2025
12 of 14 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 30, 2025
Signed-off-by: Dhrubo Saha <[email protected]>
(cherry picked from commit f01de7f)
peterzhuamazon pushed a commit that referenced this pull request Apr 30, 2025
Signed-off-by: Dhrubo Saha <[email protected]>
(cherry picked from commit f01de7f)

Co-authored-by: Dhrubo Saha <[email protected]>
dhrubo-os added a commit that referenced this pull request May 8, 2025
* [BUG] Agent Framework: Handle model response when toolUse is not accompanied by text (#3755)

* fix: handle model response when toolUse is not accompanied by text

Signed-off-by: Pavan Yekbote <[email protected]>

* feat: add test case for parseLLMOutput

Signed-off-by: Pavan Yekbote <[email protected]>

---------

Signed-off-by: Pavan Yekbote <[email protected]>

* [BUG] Allow user to control react agent max_interations value to prevent empty response (#3756)

* fix: expose max_iteration for react

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: defaults for agent execution and differentiate between step and step result

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: return react agent id in agent response to expose more details

Signed-off-by: Pavan Yekbote <[email protected]>

* spotless

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: remove test prompt from react system prompt

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: rename parameters exposed to user to executor

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: give user complete control over planner system prompt

Signed-off-by: Pavan Yekbote <[email protected]>

---------

Signed-off-by: Pavan Yekbote <[email protected]>

* Clean up JSM from MCP (#3773)

Signed-off-by: rithin-pullela-aws <[email protected]>

* [Bug] ListTools call does not return tool attributes (#3785)

* initial commit for MCP server in OpenSearch (#3781)

* initial commit for MCP server in OpenSearch

Signed-off-by: zane-neo <[email protected]>

* Make change to support register or remove tools across cluster

Signed-off-by: zane-neo <[email protected]>

* format code

Signed-off-by: zane-neo <[email protected]>

* fix UT failure caused by code change

Signed-off-by: zane-neo <[email protected]>

* format code

Signed-off-by: zane-neo <[email protected]>

* format code

Signed-off-by: zane-neo <[email protected]>

* add license header

Signed-off-by: zane-neo <[email protected]>

* fix notifications initialized not respond issue

Signed-off-by: zane-neo <[email protected]>

* fix minor issues and add UTs

Signed-off-by: zane-neo <[email protected]>

* Add more UTs

Signed-off-by: zane-neo <[email protected]>

---------

Signed-off-by: zane-neo <[email protected]>

* Remove beta1 qualifier (#3794) (#3795)

(cherry picked from commit 3f503f1)

Signed-off-by: Peter Zhu <[email protected]>
Co-authored-by: Peter Zhu <[email protected]>

* [AUTO] Increment version to 3.1.0-SNAPSHOT (#3789)

* Increment version to 3.1.0-SNAPSHOT

Signed-off-by: opensearch-ci-bot <[email protected]>

* Update build.gradle

Signed-off-by: Peter Zhu <[email protected]>

---------

Signed-off-by: opensearch-ci-bot <[email protected]>
Signed-off-by: Peter Zhu <[email protected]>
Co-authored-by: opensearch-ci-bot <[email protected]>
Co-authored-by: Peter Zhu <[email protected]>

* add release note for 3.0 (#3792)

Signed-off-by: Mingshi Liu <[email protected]>

* support MCP session management (#3803)

* support MCP session management

Signed-off-by: zane-neo <[email protected]>

* Addressing comments

Signed-off-by: zane-neo <[email protected]>

* add feature flag for mcp server and renaming mcp connector feature flag

Signed-off-by: zane-neo <[email protected]>

* Address critical comments in #3781

Signed-off-by: zane-neo <[email protected]>

---------

Signed-off-by: zane-neo <[email protected]>

* upgrade http client to version align with core (#3809)

* upgrade http client to versoin align with core

Signed-off-by: zane-neo <[email protected]>

* upgrade httpclient-h2 to correct versiono

Signed-off-by: zane-neo <[email protected]>

* use placeholder approach

Signed-off-by: zane-neo <[email protected]>

---------

Signed-off-by: zane-neo <[email protected]>

* support customized message endpoint and addressing comments (#3810)

* support customized message endpoint and addressing comments

Signed-off-by: zane-neo <[email protected]>

* fix UT failures

Signed-off-by: zane-neo <[email protected]>

* add files to jacoco exception

Signed-off-by: zane-neo <[email protected]>

* fix tool name issue and optimize register tool api

Signed-off-by: zane-neo <[email protected]>

* fix schema not parsed correctly issue and NPE when parameters is null

Signed-off-by: zane-neo <[email protected]>

* fix failure UT

Signed-off-by: zane-neo <[email protected]>

---------

Signed-off-by: zane-neo <[email protected]>

* excluding circuit breaker for Agent (#3814)

Signed-off-by: Dhrubo Saha <[email protected]>

* change release note (#3811)

* change release note

Signed-off-by: zane-neo <[email protected]>

* Update opensearch-ml-common.release-notes-3.0.0.0.md

* Update opensearch-ml-common.release-notes-3.0.0.0.md

* Update opensearch-ml-common.release-notes-3.0.0.0.md

---------

Signed-off-by: zane-neo <[email protected]>
Co-authored-by: Peter Zhu <[email protected]>

* Downgrade MCP version to 0.9 (#3821)

Signed-off-by: rithin-pullela-aws <[email protected]>

* remove libs folder (#3824)

Signed-off-by: Yaliang Wu <[email protected]>

* add more logging to deploy/undeploy flows for better debugging (#3825)

* add more logging to deploy/undeploy flows for better debugging

Signed-off-by: Bhavana Goud Ramaram <[email protected]>

* Fix python client not able to connect to MCP server issue (#3822)

Signed-off-by: zane-neo <[email protected]>
Co-authored-by: Dhrubo Saha <[email protected]>

* exclude trusted connector check for hidden model (#3838)

Signed-off-by: Dhrubo Saha <[email protected]>

* adding tenantId to the connector executor when this is inline connector (#3837)

* adding tenantId to the connector executor when this is inline connector

Signed-off-by: Dhrubo Saha <[email protected]>

* added more unit tests

Signed-off-by: Dhrubo Saha <[email protected]>

---------

Signed-off-by: Dhrubo Saha <[email protected]>

---------

Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: rithin-pullela-aws <[email protected]>
Signed-off-by: zane-neo <[email protected]>
Signed-off-by: Peter Zhu <[email protected]>
Signed-off-by: opensearch-ci-bot <[email protected]>
Signed-off-by: Mingshi Liu <[email protected]>
Signed-off-by: Dhrubo Saha <[email protected]>
Signed-off-by: Yaliang Wu <[email protected]>
Signed-off-by: Bhavana Goud Ramaram <[email protected]>
Co-authored-by: Pavan Yekbote <[email protected]>
Co-authored-by: Rithin Pullela <[email protected]>
Co-authored-by: zane-neo <[email protected]>
Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com>
Co-authored-by: Peter Zhu <[email protected]>
Co-authored-by: opensearch-ci-bot <[email protected]>
Co-authored-by: Mingshi Liu <[email protected]>
Co-authored-by: Yaliang Wu <[email protected]>
Co-authored-by: Bhavana Goud Ramaram <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants