Skip to content

[Feature Request] Write shard failures (ie: partial failures) to Cloudwatch Logs #18025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sander-bol opened this issue Apr 22, 2025 · 2 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search:Resiliency

Comments

@sander-bol
Copy link

Is your feature request related to a problem? Please describe

During a recent issue we experienced partial failures, with an unhandled exception on the shard-level. Even though we had full logging enabled on the OpenSearch service, the exception was not logged to Cloudwatch. The only place where the error was surfaced was in the HTTP response. As we weren't aware of this error happening, we did not expect the per-shard errors in the response, so our application did not log them.

It is our understanding this is due to OpenSearch treating a failure as a successful response. AWS Support pointed us at allow_partial_search_results. Based on AWS Support guidance, we're moving to implement this now - however, a solution on the OpenSearch level that would have helped us identify the issue sooner would be appreciated.

Describe the solution you'd like

It would be preferable if shard-level partial errors were logged to CloudWatch Logs, so when investigating a failure they are surfaced immediately instead of requiring deployment of changes to our running application (+ needing to understand the client SDK's event hook mechanism).

Related component

Search:Resiliency

Describe alternatives you've considered

We have since implemented an on-response listener to handle this, and log the errors on an application level. I can imagine most users don't implement such an event listener, given that the examples don't cover it and the required elements in the response only show up in case of actual failures happening.

Additional context

Based on AWS case; case number can be provided on request - not sure if case numbers can be potentially sensitive.

@sander-bol sander-bol added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 22, 2025
@sandeshkr419
Copy link
Member

Thanks @sander-bol for bringing it here.
This is related to AWS OpenSearch Service and not OpenSearch, so you may want to reach out to AWS support directly for this.

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Apr 23, 2025
@sander-bol
Copy link
Author

No problem! As with the other issue, I appreciate your response and will push it back to AWS Support for internal follow-up as improvements to the service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Resiliency
Projects
Status: Done
Development

No branches or pull requests

2 participants