[Feature Request] Write shard failures (ie: partial failures) to Cloudwatch Logs #18025

sander-bol · 2025-04-22T08:37:30Z

Is your feature request related to a problem? Please describe

During a recent issue we experienced partial failures, with an unhandled exception on the shard-level. Even though we had full logging enabled on the OpenSearch service, the exception was not logged to Cloudwatch. The only place where the error was surfaced was in the HTTP response. As we weren't aware of this error happening, we did not expect the per-shard errors in the response, so our application did not log them.

It is our understanding this is due to OpenSearch treating a failure as a successful response. AWS Support pointed us at allow_partial_search_results. Based on AWS Support guidance, we're moving to implement this now - however, a solution on the OpenSearch level that would have helped us identify the issue sooner would be appreciated.

Describe the solution you'd like

It would be preferable if shard-level partial errors were logged to CloudWatch Logs, so when investigating a failure they are surfaced immediately instead of requiring deployment of changes to our running application (+ needing to understand the client SDK's event hook mechanism).

Related component

Search:Resiliency

Describe alternatives you've considered

We have since implemented an on-response listener to handle this, and log the errors on an application level. I can imagine most users don't implement such an event listener, given that the examples don't cover it and the required elements in the response only show up in case of actual failures happening.

Additional context

Based on AWS case; case number can be provided on request - not sure if case numbers can be potentially sensitive.

sandeshkr419 · 2025-04-23T16:24:53Z

Thanks @sander-bol for bringing it here.
This is related to AWS OpenSearch Service and not OpenSearch, so you may want to reach out to AWS support directly for this.

sander-bol · 2025-04-23T16:33:20Z

No problem! As with the other issue, I appreciate your response and will push it back to AWS Support for internal follow-up as improvements to the service.

sander-bol added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 22, 2025

github-actions bot added the Search:Resiliency label Apr 22, 2025

github-project-automation bot added this to Search Project Board Apr 22, 2025

github-project-automation bot moved this to 🆕 New in Search Project Board Apr 22, 2025

sandeshkr419 removed the untriaged label Apr 23, 2025

sandeshkr419 closed this as completed Apr 23, 2025

github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Write shard failures (ie: partial failures) to Cloudwatch Logs #18025

[Feature Request] Write shard failures (ie: partial failures) to Cloudwatch Logs #18025

sander-bol commented Apr 22, 2025

sandeshkr419 commented Apr 23, 2025

Uh oh!

sander-bol commented Apr 23, 2025

Uh oh!

[Feature Request] Write shard failures (ie: partial failures) to Cloudwatch Logs #18025

[Feature Request] Write shard failures (ie: partial failures) to Cloudwatch Logs #18025

Comments

sander-bol commented Apr 22, 2025

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

sandeshkr419 commented Apr 23, 2025

Uh oh!

sander-bol commented Apr 23, 2025

Uh oh!