Skip to content

async generate_content is very slow #557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ascillitoe opened this issue Mar 22, 2025 · 7 comments
Open

async generate_content is very slow #557

ascillitoe opened this issue Mar 22, 2025 · 7 comments
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@ascillitoe
Copy link

The async performance of the new SDK still seems to be much worse than the old SDK (with transport='grpc_asyncio').

We can do 1000 text classifications in ~5s with the old client, but this consistently takes over 30s with the new client.

Is this a know issue? Or are there certain settings that must be configured with the new client? (e.g. we found the transport option was very important with the previous client).

Environment details

  • Programming language: Python
  • OS: Ubuntu 22.04.5 LTS
  • Language runtime version: 3.10.12
  • Package version: 1.7.0

Steps to reproduce

  1. Run N basic text completions async with the new google.genai.client.aio.models.generate_content
  2. Compare with the old google.generativeai generate_content_async
@ascillitoe ascillitoe added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Mar 22, 2025
@andrew-stelmach
Copy link

Compare the runtimes before and after the call to generate_content. LLM response speeds are stochastic as every provider changes how many gpus are being used, etc

@yinghsienwu
Copy link
Contributor

Note that google.genai uses REST transport. In general, gRPC is faster than REST. We'll calibrate the runtime performance and see how to improve. Thanks for raising this.

@ascillitoe
Copy link
Author

Hi @yinghsienwu, thanks for the response. Are there no plans to add support for gRPC AsyncIO like the old SDK? This seems like a pretty big regression?

@yinghsienwu yinghsienwu added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 1, 2025
@hugbubby
Copy link

Same problem here. It's an extremely slow interface. We're getting 100% cpu usage with only a handful of concurrent requests, which is practically unusable for our purposes.

@yinghsienwu
Copy link
Contributor

yinghsienwu commented Apr 15, 2025

I compared Vertex SDK (google-cloud-aiplatform) (#1 default grpc_asyncio, and #2 rest_asyncio transport) with #3 google-genai SDK v1.9 (rest, httpx) and #4 aiohttp prototype for async generateContent requests (100, 500, 1000 async requests).

  1. Vertex SDK, rest_asyncio and grpc_asyncio transport perform similarly (within the std dev). gRPC should not be the key to better runtime performance.
  2. Currently google-genai SDK (httpx) runtime is ~6X of grpc_asyncio runtime when sending 1000 async requests. (same as the observation above (async generate_content is very slow #557 (comment)).
  3. If we want to improve runtime performance, using aiohttp in google-genai SDK AsyncClient implementation may achieve similar performance as Vertex SDK's rest_asyncio.

We'll try to put it into our roadmap.

@ascillitoe
Copy link
Author

Thanks for confirming @yinghsienwu! Do you think this situation might be improved by the time the old sdk reaches end of life on Aug 31st?

@yinghsienwu
Copy link
Contributor

I think likely to be available in Q2 2025. I'll attach a PR here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

4 participants