In a Kubernetes cluster, inconsistencies in GPU models and CUDA driver versions across different GPU nodes lead to the following issues:
Version mismatch: The CUDA runtime version used by applications may be incompatible with the CUDA driver version on certain nodes, resulting in failures or performance problems.
Scheduling challenges: The native Kubernetes scheduler is unaware of CUDA version dependencies and cannot guarantee that applications are scheduled onto GPU nodes with compatible versions.
High maintenance overhead: Manually managing the CUDA version dependencies between nodes and applications increases operational complexity.
This document provides a step-by-step guide for configuring accurately scheduling inference services based on the CUDA runtime version and Nvidia Driver version. With these settings, you can resolve the CUDA Runtime and CUDA Driver version mismatch at the Kubernetes scheduling level to ensure that applications are scheduled to compatible GPU nodes.
On each GPU node, run the following command to retrieve the supported CUDA runtime version:
For example, the output might be 12.4.
On the control node, label the GPU node with the corresponding major and minor version:
If your cluster has many GPU nodes, it is difficult to label them manually. You can install the Node Feature Discovery cluster plugin.
By deploying the Node Feature Discovery(NFD) cluster plugin and turning on the GFD extension, GPU nodes will automatically be labeled with the CUDA version.
Node Feature Discovery cluster plugin can be retrieved from Customer Portal. Please contact Consumer Support for more information.
Starting from Alauda AI 1.5, the product will automatically schedule pod of inference services by the CUDA version. For earlier versions, you can follow the following steps:
cpaas.io/accelerator-type is nvidia, further parse cpaas.io/cuda-version (example 11.8).