Skip to content

0.19.3-v1

Compare
Choose a tag to compare
@r4victor r4victor released this 10 Apr 10:49
89d5de7

Optimized networking for GCP H100 clusters

dstack now automatically sets up GCP A3 Mega instances with GPUDirect-TCPXO optimized NCCL communication to take advantage of the 1800Gbps maximum network bandwidth. Here's NCCL tests results on an A3 Mega cluster provisioned with dstack:

✗ dstack apply -f examples/misc/a3mega-clusters/nccl-tests.dstack.yml 

nccl-tests provisioning completed (running)
nThread 1 nGpus 1 minBytes 8388608 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 200 agg iters: 1 validation: 0 graph: 0

                                                             out-of-place                       in-place          
      size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
       (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
     8388608        131072     float    none      -1    166.6   50.34   47.19    N/A    164.1   51.11   47.92    N/A
    16777216        262144     float    none      -1    204.6   82.01   76.89    N/A    203.8   82.30   77.16    N/A
    33554432        524288     float    none      -1    284.0  118.17  110.78    N/A    281.7  119.12  111.67    N/A
    67108864       1048576     float    none      -1    447.4  150.00  140.62    N/A    443.5  151.31  141.86    N/A
   134217728       2097152     float    none      -1    808.3  166.05  155.67    N/A    801.9  167.38  156.92    N/A
   268435456       4194304     float    none      -1   1522.1  176.36  165.34    N/A   1518.7  176.76  165.71    N/A
   536870912       8388608     float    none      -1   2892.3  185.62  174.02    N/A   2894.4  185.49  173.89    N/A
  1073741824      16777216     float    none      -1   5532.7  194.07  181.94    N/A   5530.7  194.14  182.01    N/A
  2147483648      33554432     float    none      -1    10863  197.69  185.34    N/A    10837  198.17  185.78    N/A
  4294967296      67108864     float    none      -1    21481  199.94  187.45    N/A    21466  200.08  187.58    N/A
  8589934592     134217728     float    none      -1    42713  201.11  188.54    N/A    42701  201.16  188.59    N/A
Out of bounds values : 0 OK
Avg bus bandwidth    : 146.948 

Done

For more information on how to provision and use A3 Mega clusters with GPUDirect-TCPXO, see the A3 Mega example.

H200 and B200 support on Datacrunch

You can now provision H200 and B200 instances on DataCrunch. DataCrunch is the first dstack backend to support B200:

✗ dstack apply --gpu B200
 Project              main                                   
 User                 admin                                  
 Configuration        .dstack.yml                            
 Type                 dev-environment                        
 Resources            1..xCPU, 2GB.., 1xB200, 100GB.. (disk) 
 Max price            -                                      
 Max duration         -                                      
 Inactivity duration  -                                      
 Spot policy          auto                                   
 Retry policy         -                                      
 Creation policy      reuse-or-create                        
 Idle duration        5m                                     
 Reservation          -                                      

 #  BACKEND     REGION  INSTANCE   RESOURCES                                      SPOT  PRICE                
 1  datacrunch  FIN-03  1B200.31V  31xCPU, 250GB, 1xB200 (180GB), 100.0GB (disk)  yes   $1.3                 
 2  datacrunch  FIN-03  1B200.31V  31xCPU, 250GB, 1xB200 (180GB), 100.0GB (disk)  no    $4.49
 3  datacrunch  FIN-01  1B200.31V  31xCPU, 250GB, 1xB200 (180GB), 100.0GB (disk)  yes   $1.3   not available 
    ...                                                                                                      
 Shown 3 of 8 offers, $4.49 max

Submit a new run? [y/n]:                        

CUDO improvements

The CUDO backend is updated to support H100, A100, A40 and all other GPUs currently offered by CUDO.

fleets configuration property

With the new fleets property and --fleet dstack apply option, it's now possible to restrict a set of fleets considered for reuse:

type: task

fleets: [my-fleet-1, my-fleet-2]

or

dstack apply --fleet my-fleet-1 --fleet my-fleet-2

What's Changed

New Contributors

Full Changelog: dstackai/dstack@0.19.2...0.19.3