Glossary — NCA-AIIO Acronyms and Terms

All acronyms and key terms from the three exam domains, alphabetical order.


A

AI (Artificial Intelligence) — Any technique enabling machines to perform tasks that normally require human intelligence. Umbrella term encompassing ML and DL.

AI Enterprise (NVIDIA AI Enterprise) — NVIDIA’s full-stack, enterprise-grade AI software suite. Includes TensorRT, Triton, NeMo, RAPIDS. Comes with enterprise support SLA. Required for vGPU deployments.

AI Factory — An AI data center purpose-built for single or few extremely large training workloads. Uses NVLink + InfiniBand. Contrast with AI Cloud.

AI Cloud — A multi-tenant hyperscale AI infrastructure serving many diverse concurrent workloads. Uses Ethernet + RoCE. Examples: AWS, Azure, GCP GPU clouds.

ATS (Automatic Transfer Switch) — Electrical switch that automatically transfers power from utility to generator during an outage. Transfer time < 10 seconds.

AOC (Active Optical Cable) — Fiber optic cable with active (powered) transceivers at each end. Used for InfiniBand runs > 3 meters.


B

BF16 (Brain Float 16) — 16-bit floating point format with 8 exponent bits and 7 mantissa bits. Preferred over FP16 for training due to wider dynamic range. Native to Tensor Cores from Ampere onward.

BlueField (BlueField-3) — NVIDIA’s DPU product line. BlueField-3 provides 400 GbE networking + Arm cores + hardware accelerators. Programmed via DOCA.

BMC (Baseboard Management Controller) — Dedicated microcontroller on server motherboard providing out-of-band management. Enables power control, console access, and hardware monitoring independent of the host OS.


C

CapEx (Capital Expenditure) — Upfront cost to purchase hardware/infrastructure. On-premises deployments are CapEx-heavy.

C2C (Chip-to-Chip) — NVLink variant for connecting CPU and GPU dies directly within a Superchip (e.g., Grace+H100 in GH200). Provides coherent memory access.

CRAC (Computer Room Air Conditioning) — Self-contained cooling unit for data centers. Practical limit ~30 kW/rack.

CRAH (Computer Room Air Handler) — Air handler connected to a building’s chilled water plant. More efficient than CRAC for large deployments.

CUDA (Compute Unified Device Architecture) — NVIDIA’s parallel computing platform and programming model. Foundation for all NVIDIA AI software.

cuDNN — NVIDIA’s library of GPU-accelerated deep learning primitives. Used by PyTorch, TensorFlow, JAX under the hood.


D

DAC (Direct Attach Copper) — Passive copper cable for short-reach InfiniBand or Ethernet connections (up to ~3 meters).

DCGM (Data Center GPU Manager) — NVIDIA’s GPU monitoring and diagnostics framework. Provides health monitoring, telemetry, active diagnostics, and policy management.

DCIM (Data Center Infrastructure Management) — Software for managing physical data center assets: power, cooling, space, and connectivity.

DCN (Data Center Networking) — General term for the network infrastructure within a data center.

DL (Deep Learning) — Subset of ML using neural networks with many layers. Foundation of modern AI (transformers, CNNs, diffusion models).

DLC (Direct Liquid Cooling) — Cooling method where coolant flows directly to cold plates on GPU/CPU. Required for racks > 40 kW.

DMA (Direct Memory Access) — A device accesses host memory directly without CPU involvement.

DOCA (Data Center Infrastructure on a Chip Architecture) — NVIDIA’s SDK for programming BlueField DPUs. Provides unified API across BlueField generations.

DPU (Data Processing Unit) — Third pillar of computing (alongside CPU and GPU). Offloads, accelerates, and isolates infrastructure tasks (networking, storage, security). NVIDIA’s product: BlueField.


E

E-W (East-West) — Traffic flowing between servers within the same cluster. In AI: GPU-to-GPU gradient exchange. Requires high bandwidth and low latency.

ECC (Error Correction Code) — Memory protection technology. Single-bit errors (SBE) are corrected; double-bit errors (DBE) are uncorrectable and require page retirement.

ECN (Explicit Congestion Notification) — Network mechanism where switches mark congested packets; receivers signal senders to slow down (used in DCQCN for RoCE congestion control).


F

FP4 / FP8 / FP16 / FP32 — Floating point precision formats. Lower precision = smaller memory footprint + higher throughput. FP4 introduced in Blackwell.

Fat-tree — Network topology where aggregate uplink bandwidth equals downlink bandwidth. Non-blocking: any node can communicate with any other at full line rate simultaneously.

FDR InfiniBand — 56 Gbps InfiniBand generation. Legacy.


G

GRES (Generic Resource Scheduling) — Slurm mechanism for tracking and allocating non-CPU resources including GPUs.

GPU (Graphics Processing Unit) — Massively parallel processor optimized for matrix/tensor operations. Primary compute engine for AI workloads.

GPU Operator (NVIDIA GPU Operator) — Kubernetes operator that automates installation of the full NVIDIA GPU software stack (drivers, CUDA, DCGM, device plugin, MIG manager).

GPUDirect RDMA — Technology enabling the network adapter to read/write GPU memory directly, bypassing CPU memory. Eliminates the GPU→CPU→NIC copy in distributed training.


H

HBM (High Bandwidth Memory) — Stacked DRAM mounted on the GPU package. Provides very high bandwidth (3–8 TB/s). Used in data center GPUs (H100, B200).

HDR InfiniBand — 200 Gbps InfiniBand generation. Previous standard for H100 clusters.

HPC (High-Performance Computing) — Computing for scientific simulation and research. Often runs alongside AI workloads on shared GPU clusters.


I

IB (InfiniBand) — High-throughput, low-latency networking technology with native RDMA support. Industry standard for HPC and AI Factory scale-out networks.

IBTA (InfiniBand Trade Association) — Organization that maintains the InfiniBand specification.

iDRAC (Integrated Dell Remote Access Controller) — Dell’s BMC implementation. Provides out-of-band management for Dell servers.

iLO (Integrated Lights-Out) — HPE’s BMC implementation.

IPMI (Intelligent Platform Management Interface) — Industry-standard protocol for communicating with BMCs. Version 2.0 is widely supported. Being superseded by Redfish.


J

JBOD (Just a Bunch of Disks) — Storage enclosure without RAID. Used for raw capacity.


K

K8s (Kubernetes) — Container orchestration platform. Used for AI inference serving and increasingly for training. GPU support via NVIDIA GPU Operator.

KV cache (Key-Value cache) — In transformer inference, stores previously computed key and value tensors for each layer and token. Memory-intensive at long context lengths.


L

LoRA (Low-Rank Adaptation) — Parameter-efficient fine-tuning technique. Trains only a small set of adapter weights, leaving most model parameters frozen. Dramatically reduces fine-tuning compute cost.


M

MFU (Model FLOP Utilization) — Ratio of actual compute operations to theoretical peak GPU FLOPS. Healthy training runs achieve 40–60% MFU.

MIG (Multi-Instance GPU) — Hardware GPU partitioning technology (Ampere/Hopper). Divides one GPU into up to 7 fully isolated instances with dedicated SM, HBM, L2, and bandwidth.

ML (Machine Learning) — Subset of AI where systems learn from data without explicit programming.

MLOps — Application of DevOps principles to machine learning — automating the lifecycle of training, validation, deployment, and monitoring of ML models.

MMR (Meet-Me Room) — Physical location in a data center where carrier fiber connects to the building’s network.


N

N-S (North-South) — Traffic flowing between the data center and users/internet. In AI: management traffic, user access, result retrieval. Standard Ethernet.

NCCL (NVIDIA Collective Communications Library) — Library for multi-GPU and multi-node collective operations (all-reduce, all-gather, broadcast). Essential for distributed training.

NDR InfiniBand — 400 Gbps InfiniBand generation. Current standard for H100/B200 clusters.

NGC (NVIDIA GPU Cloud) — NVIDIA’s catalog of GPU-optimized containers, pre-trained models, and SDKs. Free to access.

NVLink — NVIDIA’s high-speed GPU interconnect. NVLink 4 (Hopper): 900 GB/s; NVLink 5 (Blackwell): 1.8 TB/s.

NVMe (Non-Volatile Memory Express) — PCIe-based SSD protocol. Very high IOPS and bandwidth vs SATA/SAS.

NVMe-oF (NVMe over Fabrics) — Extension of NVMe protocol over network fabrics (RDMA/TCP). BlueField-3 DPU offloads NVMe-oF processing.

NVSwitch — NVIDIA switch chip that creates fully non-blocking all-to-all NVLink fabric within a system (e.g., DGX H100 uses 4× NVSwitch for 8-GPU all-to-all).


O

OAM (OCP Accelerator Module) — Server form factor used in hyperscaler platforms for GPU hosting.

OOB (Out-of-Band) — Management traffic on a dedicated separate network, independent of production traffic.

OpEx (Operating Expenditure) — Ongoing costs (power, maintenance, licensing, cloud subscription). Cloud deployments are OpEx-heavy.


P

PCIe (Peripheral Component Interconnect Express) — Standard interface for connecting GPUs, NICs, and NVMe drives to servers. PCIe Gen5: 128 GB/s x16; PCIe Gen6: 256 GB/s.

PDU (Power Distribution Unit) — Rack-level power distribution. Intelligent PDUs monitor per-outlet power consumption.

PEFT (Parameter-Efficient Fine-Tuning) — Family of techniques for fine-tuning large models by training only a small subset of parameters. Includes LoRA, prefix tuning, adapters.

PFC (Priority Flow Control) — Ethernet mechanism that sends per-priority pause frames to prevent packet drops. Required for lossless Ethernet (RoCE).

PUE (Power Usage Effectiveness) — Data center efficiency metric: Total Facility Power / IT Equipment Power. Lower = better. Hyperscale targets 1.1–1.2.


R

RDMA (Remote Direct Memory Access) — Network transfer mechanism allowing one machine to directly read/write memory on another without involving the remote CPU or OS. Zero-copy, kernel-bypass.

Redfish — Modern REST/JSON-based API standard for server management (BMC). Supersedes IPMI.

RLHF (Reinforcement Learning from Human Feedback) — Technique for aligning LLMs to human preferences using a reward model trained on human rankings.

RoCE (RDMA over Converged Ethernet) — Takes RDMA semantics and runs them over Ethernet. Requires lossless Ethernet (PFC). Two versions: RoCEv1 (L2 only) and RoCEv2 (IP-routable).


S

SEL (System Event Log) — BMC log of hardware events (errors, power events, temperature alerts).

SM (Streaming Multiprocessor) — Basic compute unit in an NVIDIA GPU. Each SM contains CUDA cores, Tensor Cores, and a cache. Thousands of SMs work in parallel.

SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) — NVIDIA InfiniBand feature that performs collective operations (all-reduce) inside network switches, reducing GPU data traffic.

Slurm — Open-source HPC job scheduler. Dominant in AI training clusters. Manages job queuing, resource allocation, and gang scheduling for multi-node jobs.

SXM — High-performance GPU slot form factor used in DGX systems. Supports full NVLink bandwidth and highest TDP. Not PCIe-compatible.


T

TDP (Thermal Design Power) — Maximum heat a component generates under sustained workload. Data center cooling and power circuits must handle TDP plus system overhead.

Tensor Core — Specialized matrix multiply-accumulate (GEMM) units within NVIDIA GPU SMs. Core accelerator for AI workloads. Each generation adds precision formats (FP4 in Blackwell).

TensorRT — NVIDIA SDK for deep learning inference optimization. Layer fusion, precision calibration (FP32→INT8), kernel auto-tuning.

TensorRT-LLM — NVIDIA’s LLM inference library. Adds paged KV cache, continuous batching, in-flight batching.

Triton (NVIDIA Triton Inference Server) — Open-source inference serving framework. Supports TensorRT, ONNX, PyTorch, TensorFlow backends. HTTP/gRPC API, dynamic batching.

TTFT (Time to First Token) — Inference latency from request receipt to first generated token. Critical metric for interactive LLM applications.


U

UPS (Uninterruptible Power Supply) — Battery-backed power supply providing bridge power during utility failure while generators start.


V

vGPU (Virtual GPU) — NVIDIA technology for sharing a GPU across multiple VMs via time-slicing. Requires NVIDIA AI Enterprise license. Supports vCS, vWS, vPC profiles.

VRAM (Video RAM) — GPU memory (also called framebuffer or GPU memory). In data center GPUs: HBM type. Must fit the model weights for training/inference.


X

XDR InfiniBand — 800 Gbps InfiniBand generation. Used in NVIDIA Quantum-X800 switches.

XID Error — NVIDIA GPU driver error code. XID 79 = GPU fell off the bus (hardware failure). XID 48 = double-bit ECC error.


Back to top

Licensed under CC BY 4.0. Notes based on NVIDIA course materials and original field experience. Not affiliated with or endorsed by NVIDIA Corporation. No exam-dump material — all practice questions are original.