Skip to content

feat: online inferencing with gpus (downloader) #138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 23, 2025

Conversation

ferrarimarco
Copy link
Member

@ferrarimarco ferrarimarco commented Apr 16, 2025

Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.

@ferrarimarco ferrarimarco changed the base branch from main to int-inference-ref-arch April 16, 2025 13:23
@ferrarimarco ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch 2 times, most recently from b2b97e6 to 185c429 Compare April 16, 2025 17:07
@arueth arueth force-pushed the int-inference-ref-arch branch 6 times, most recently from fc05f24 to 0295256 Compare April 16, 2025 19:44
@ferrarimarco ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch from 1f1b2ca to 34f0bdb Compare April 17, 2025 07:27
@arueth arueth force-pushed the int-inference-ref-arch branch 2 times, most recently from 06b3137 to cc1ffc1 Compare April 17, 2025 20:42
@ferrarimarco ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch 3 times, most recently from b1b62b7 to 14271a4 Compare April 18, 2025 09:49
@ferrarimarco ferrarimarco changed the title feat: online inferencing with gpus reference architecture feat: online inferencing with gpus (downloader) Apr 18, 2025
@ferrarimarco ferrarimarco marked this pull request as ready for review April 18, 2025 09:50
@ferrarimarco ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch from 14271a4 to 5fb8060 Compare April 18, 2025 12:37
@arueth
Copy link
Collaborator

arueth commented Apr 18, 2025

Increasing the disk size on the node pool is going to increase the cost quite a bit for something that will be used infrequently. I think we should investigate something more event based using Cloud Build or possibly Cloud Run Jobs instead of increasing the disk size.

@ferrarimarco ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch 3 times, most recently from 521f0fe to 811f98a Compare April 18, 2025 18:02
@ferrarimarco ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch from 811f98a to 36e6e23 Compare April 18, 2025 20:53
@ferrarimarco
Copy link
Member Author

Increasing the disk size on the node pool is going to increase the cost quite a bit for something that will be used infrequently. I think we should investigate something more event based using Cloud Build or possibly Cloud Run Jobs instead of increasing the disk size.

Refactored to use Cloud Storage directly, so no need to increase the boot disk size.

@ferrarimarco ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch from 36e6e23 to ca01fc7 Compare April 18, 2025 21:10
Copy link
Member

@fernandorubbo fernandorubbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, questions and suggestions. other than that LGTM

Implement a Kubernetes Job to download models from Hugging Face to Cloud
Storage.
@arueth arueth merged commit 04f44f7 into int-inference-ref-arch Apr 23, 2025
22 checks passed
@arueth arueth deleted the ferrarimarco-online-inference-gpu branch April 23, 2025 17:42
arueth pushed a commit that referenced this pull request Apr 23, 2025
Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.
arueth pushed a commit that referenced this pull request Apr 29, 2025
Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.
arueth pushed a commit that referenced this pull request Apr 29, 2025
Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.
arueth pushed a commit that referenced this pull request May 6, 2025
Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.
arueth pushed a commit that referenced this pull request May 7, 2025
Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants