You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a virtualized environment, for DGX/HGX A100/H100 systems, NVIDIA provides the Shared NVSwitch Virtualization Model solution to enable NVLink connections for multi-gpu VMs. This requires that the GPUs assigned to the VM must belong to the same partition.
What's the Shared NVSwitch Virtualization Model
Only GPUs passed through to the guests.
NVSwitch memory fabrics are managed by a dedicated trusted VM called Service VM.
NVSwitch memory fabrics are shared by the guest VMs, but the fabrics are not visible to guests.
Requires the tightest integration with the hypervisor.
Complete bandwidth for two and four GPU VMs.
No need for direct communication between the guest VM and the Service VM.
The GPUs assigned to the VM must belong to the same partition.
How to assign the GPUs belong to the same partition
Implement the GetDevicePluginOptions interface to enable GetPreferredAllocationAvailable, allowing kubelet to request GetPreferredAllocation before allocating GPUs.
The GetPreferredAllocation interface recommends H100/H800 GPUs based on GPU
partitioning.
The Allocate interface verifies whether the GPUs belong to the same
partition during allocation.
The diagram below illustrates the partition tree for H100/H800. If partition 4 has already been allocated, partition 3 will be prioritized for the next allocation.
The text was updated successfully, but these errors were encountered:
Why is this needed?
In a virtualized environment, for DGX/HGX A100/H100 systems, NVIDIA provides the Shared NVSwitch Virtualization Model solution to enable NVLink connections for multi-gpu VMs. This requires that the GPUs assigned to the VM must belong to the same partition.
What's the Shared NVSwitch Virtualization Model
Only GPUs passed through to the guests.
shared-nvswitch-virtualization-model
Proposal
The GPUs assigned to the VM must belong to the same partition.
How to assign the GPUs belong to the same partition
GetDevicePluginOptions
interface to enableGetPreferredAllocationAvailable
, allowing kubelet to requestGetPreferredAllocation
before allocating GPUs.GetPreferredAllocation
interface recommends H100/H800 GPUs based on GPUpartitioning.
Allocate
interface verifies whether the GPUs belong to the samepartition during allocation.
The diagram below illustrates the partition tree for H100/H800. If
partition 4
has already been allocated,partition 3
will be prioritized for the next allocation.The text was updated successfully, but these errors were encountered: