Skip to content

[Feature request]: Expose vGPU licensing info via Kubernetes API #1477

Open
@rosenhouse

Description

@rosenhouse

Problem

I am developing a Kubernetes Operator that deploys workloads that use vGPUs. A common error our users hit is that they have misconfigured licensing for the GPU Operator or have run out of seats, and but they don't know that and can't immediately diagnose it.

Today, I have to do some awkward operational steps to discover this information, like run nvidia-smi and parse its output. This can be particularly difficult in a "getting started" setup where invalid licensing config does not immediately prevent usage – the vGPU works for a little while, and then slows to a crawl.

Proposed solution

I'd like to enhance GPU Operator so that it will surface up-to-date licensing information somewhere in the Kubernetes API.

I could imagine a couple different places that could happen:

  1. An annotation on the Kubernetes Node resources, for example a string:

    nvidia.com/gpu.0.license-status: "Licensed (Expiry: 2025-6-26 21:46:51 GMT)"
    

    Or with a JSON value

     nvidia.com/gpu-license-statuses: '[{ "id": "00000000:02:01.0",  "licensed": true, "expiry": "2025-6-26 21:46:51 GMT" }]'
    
  2. Alternately, maybe this could be an element in status.conditions on the ClusterPolicy or Driver Custom Resource? For example

    status:
      conditions:
      - type: Licensed
        status: "True"
        reason: LicenseOK
        message: All GPUs are licensed (Expiry: 2025-6-26 21:46:51 GMT)

These are sketches of API design. The real field names and shapes could be different.

Regardless, a Kubernetes user can now easily discover the licensing status of their vGPUs, and 3rd party controllers can check for this before attempting to launch a vGPU-requesting workload.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions