NVIDIA Cloud-Native Technologies

From the data center and cloud to the desktop and edge, NVIDIA cloud-native technologies provide the ability to run deep learning, machine learning, and other GPU-accelerated workloads managed by Kubernetes on systems with NVIDIA GPUs, and to develop containerized software that can be seamlessly deployed on enterprise cloud-native management frameworks.

Open Source

Open-source software is the foundation for NVIDIA cloud-native technologies. NVIDIA contributes to open-source projects and communities, including container runtimes, Kubernetes operators and extensions, and monitoring tools.

Enterprise-Ready

Containerized applications developed with NVIDIA cloud-native technologies can seamlessly run on enterprise cloud-native management frameworks, including Red Hat OpenShift and VMware vSphere with Tanzu, as well as NVIDIA Base Command™ and NVIDIA Fleet Command™.

Robust Ecosystem

NVIDIA cloud-native technologies support all NVIDIA enterprise GPUs and network adapters, no matter where they run. NVIDIA-Certified Systems™, available from a large list of global system manufacturers, are validated to work well with cloud-native technologies. The software is also available on cloud instances from leading cloud service providers and can be deployed on embedded systems.

NVIDIA GPU Operator

The NVIDIA GPU Operator automates the lifecycle management of the software required to expose GPUs on Kubernetes. It enables advanced functionality, including better GPU performance, utilization, and telemetry. Certified and validated for compatibility with industry-leading Kubernetes solutions, GPU Operator allows organizations to focus on building applications, rather than managing Kubernetes infrastructure.

Learn more

NVIDIA Network Operator

The NVIDIA Network Operator simplifies scale-out network design for Kubernetes by automating the deployment and configuration of the software required for accelerated networking. Paired with the GPU Operator, the Network Operator enables NVIDIA GPUDirect® RDMA , a key technology that accelerates cloud-native AI workloads by orders of magnitude. The Network Operator is also useful for enabling accelerated Kubernetes network environments for telco NFV applications, establishing RDMA connectivity for fast access to NVMe storage, and more.

Learn more

NVIDIA NIM Operator

NVIDIA NIM™ Operator automates the deployment and lifecycle management of generative AI applications built with NVIDIA NIM microservices on Kubernetes. NIM Operator delivers a better MLOps/LLMOps experience and improves performance by abstracting the deployment, configuration, and management of NIM microservices, allowing users to focus on the end-to-end application. Learn more

NVIDIA Cloud-Native Stack

NVIDIA Cloud-Native Stack is a reference architecture that enables easy access to NVIDIA GPU and Network Operators running on upstream Kubernetes. It provides a quick way to deploy Kubernetes on x86 and Arm-based systems and experience the latest NVIDIA features, such as Multi-Instance GPU (MIG), GPUDirect RDMA, GPUDirect Storage, and GPU monitoring capabilities.

Cloud-Native Stack enables developers to build, test, and run GPU-accelerated containerized applications that work with NVIDIA Operators. These applications can work seamlessly in production on enterprise Kubernetes-based platforms, such as NVIDIA Base Command, NVIDIA Fleet Command, Red Hat OpenShift, and VMware vSphere with Tanzu. Developers can deploy Cloud-Native Stack onto GPU-accelerated servers, workstations, cloud instances, or embedded systems, or they can use preconfigured Cloud-Native Stack Virtual Machine Images (VMIs) in leading cloud service providers.

Cloud-Native Stack on Github Cloud-Native Stack Virtual Machine Images

NVIDIA Container Toolkit

The NVIDIA Container Toolkit allows users to build and run GPU-accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPUs.

Containerizing GPU applications provides several benefits, including ease of deployment, the ability to run across heterogeneous environments, reproducibility, and ease of collaboration.

Learn more

NVIDIA GPU Operator: Simplifying GPU Management in Kubernetes

Streamlining Kubernetes Networking in Scale-out GPU Clusters With the new NVIDIA Network Operator 1.0

How to easily use GPUS on Kubernetes (on-demand webinar)