Description
** Please make sure you read the contribution guide and file the issues in the right place. **
Contribution guide.
Statement: Before submitting this request, I have reviewed the ADK documentation & python code and existing issues and have not found any readily available or clearly documented built-in retry mechanisms for the LLM Agent, specifically for handling API errors like 429 Too Many Requests. If such an implementation exists, please guide me to the relevant documentation.
Is your feature request related to a problem? Please describe.
I am frequently encountering 429 Too Many Requests errors when using the LLM Agent within the Google ADK. The current implementation does not provide any direct or convenient mechanism to handle these errors, specifically for retries. This leads to failed requests and interruptions in our application's workflow, requiring manual intervention or the implementation of complex, external retry logic.
Describe the solution you'd like
I would like to request the addition of a built-in, configurable retry mechanism for the LLM Agent within the Google ADK. This mechanism should automatically handle transient API errors, such as 429 Too Many Requests, 500 Internal Server Error, 503 Service Unavailable, etc., by implementing an exponential backoff strategy with configurable parameters.
Ideally, this would involve:
- Automatic Retries: The LLM Agent should automatically retry failed requests based on a predefined set of HTTP status codes.
- Configurable Backoff: Users should be able to configure parameters like:
Maximum number of retries.
Initial backoff delay.
Maximum backoff delay. - Error Handling: A clear way to differentiate between retriable and non-retriable errors, allowing for appropriate handling of persistent issues.
Describe alternatives you've considered
I have considered using the HttpOptions base_url parameter to route requests through a custom proxy or service that would implement the retry logic. However, this approach introduces unnecessary overhead and complexity:
-
Additional Hop: It adds an extra network hop between our application and the Google API, potentially increasing latency and introducing a new point of failure.
-
Maintenance Burden: It requires us to deploy, manage, and maintain an additional proxy service, increasing operational overhead.
-
Security Concerns: Depending on the proxy implementation, it might introduce additional security considerations.
Additional context
The lack of a built-in retry mechanism makes integrating the LLM Agent into robust, production-ready applications challenging. Many other SDKs and client libraries for interacting with cloud services provide such functionality out-of-the-box, significantly simplifying error handling and improving application resilience. Implementing this feature would greatly enhance the developer experience and the reliability of applications built using the Google ADK's LLM Agent.