feat: online inferencing with gpus (downloader) #138

ferrarimarco · 2025-04-16T13:23:24Z

Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.

arueth · 2025-04-18T15:03:33Z

Increasing the disk size on the node pool is going to increase the cost quite a bit for something that will be used infrequently. I think we should investigate something more event based using Cloud Build or possibly Cloud Run Jobs instead of increasing the disk size.

platforms/gke/base/use-cases/inference-ref-arch/terraform/cloud_storage/main.tf

platforms/gke/base/use-cases/inference-ref-arch/terraform/deploy.sh

platforms/gke/base/use-cases/inference-ref-arch/terraform/teardown.sh

ferrarimarco · 2025-04-18T20:56:03Z

Increasing the disk size on the node pool is going to increase the cost quite a bit for something that will be used infrequently. I think we should investigate something more event based using Cloud Build or possibly Cloud Run Jobs instead of increasing the disk size.

Refactored to use Cloud Storage directly, so no need to increase the boot disk size.

fernandorubbo

minor comments, questions and suggestions. other than that LGTM

...ases/inference-ref-arch/kubernetes-manifests/model-download/load-model-to-cloud-storage.yaml

platforms/gke/base/use-cases/inference-ref-arch/online-inference-gpu/README.md

platforms/gke/base/use-cases/inference-ref-arch/terraform/deploy.sh

platforms/gke/base/use-cases/inference-ref-arch/online-inference-gpu/README.md

Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.

ferrarimarco changed the base branch from main to int-inference-ref-arch April 16, 2025 13:23

ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch 2 times, most recently from b2b97e6 to 185c429 Compare April 16, 2025 17:07

arueth force-pushed the int-inference-ref-arch branch 6 times, most recently from fc05f24 to 0295256 Compare April 16, 2025 19:44

ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch from 1f1b2ca to 34f0bdb Compare April 17, 2025 07:27

arueth force-pushed the int-inference-ref-arch branch 2 times, most recently from 06b3137 to cc1ffc1 Compare April 17, 2025 20:42

ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch 3 times, most recently from b1b62b7 to 14271a4 Compare April 18, 2025 09:49

ferrarimarco changed the title ~~feat: online inferencing with gpus reference architecture~~ feat: online inferencing with gpus (downloader) Apr 18, 2025

ferrarimarco marked this pull request as ready for review April 18, 2025 09:50

ferrarimarco requested review from arueth and fernandorubbo April 18, 2025 09:50

ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch from 14271a4 to 5fb8060 Compare April 18, 2025 12:37

ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch 3 times, most recently from 521f0fe to 811f98a Compare April 18, 2025 18:02

arueth reviewed Apr 18, 2025

View reviewed changes

ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch from 811f98a to 36e6e23 Compare April 18, 2025 20:53

ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch from 36e6e23 to ca01fc7 Compare April 18, 2025 21:10

fernandorubbo approved these changes Apr 20, 2025

View reviewed changes

feat: online inferencing with gpus (downloader)

4c53637

Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.

ferrarimarco force-pushed the ferrarimarco-online-inference-gpu branch from ca01fc7 to 4c53637 Compare April 22, 2025 08:46

ferrarimarco mentioned this pull request Apr 23, 2025

feat: online inference with gpus (vllm-llama4) #144

Open

arueth approved these changes Apr 23, 2025

View reviewed changes

arueth merged commit 04f44f7 into int-inference-ref-arch Apr 23, 2025
22 checks passed

arueth deleted the ferrarimarco-online-inference-gpu branch April 23, 2025 17:42

arueth pushed a commit that referenced this pull request Apr 23, 2025

feat: online inferencing with gpus (downloader) (#138)

60559e4

Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.

arueth pushed a commit that referenced this pull request Apr 29, 2025

feat: online inferencing with gpus (downloader) (#138)

2b34448

Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.

arueth pushed a commit that referenced this pull request Apr 29, 2025

feat: online inferencing with gpus (downloader) (#138)

1aef38e

Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.

arueth pushed a commit that referenced this pull request May 6, 2025

feat: online inferencing with gpus (downloader) (#138)

2be1d05

Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.

arueth pushed a commit that referenced this pull request May 7, 2025

feat: online inferencing with gpus (downloader) (#138)

2f74873

Implement a Kubernetes Job to download models from Hugging Face to Cloud Storage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: online inferencing with gpus (downloader) #138

feat: online inferencing with gpus (downloader) #138

ferrarimarco commented Apr 16, 2025 •

edited

Loading

arueth commented Apr 18, 2025

ferrarimarco commented Apr 18, 2025

fernandorubbo left a comment

feat: online inferencing with gpus (downloader) #138

feat: online inferencing with gpus (downloader) #138

Conversation

ferrarimarco commented Apr 16, 2025 • edited Loading

arueth commented Apr 18, 2025

ferrarimarco commented Apr 18, 2025

fernandorubbo left a comment

Choose a reason for hiding this comment

ferrarimarco commented Apr 16, 2025 •

edited

Loading