Glossary — NCA-AIIO Acronyms and Terms
All acronyms and key terms from the three exam domains, alphabetical order.
A
AI (Artificial Intelligence) — Any technique enabling machines to perform tasks that normally require human intelligence. Umbrella term encompassing ML and DL.
AI Enterprise (NVIDIA AI Enterprise) — NVIDIA’s full-stack, enterprise-grade AI software suite. Includes TensorRT, Triton, NeMo, RAPIDS. Comes with enterprise support SLA. Required for vGPU deployments.
AI Factory — An AI data center purpose-built for single or few extremely large training workloads. Uses NVLink + InfiniBand. Contrast with AI Cloud.
AI Cloud — A multi-tenant hyperscale AI infrastructure serving many diverse concurrent workloads. Uses Ethernet + RoCE. Examples: AWS, Azure, GCP GPU clouds.
ATS (Automatic Transfer Switch) — Electrical switch that automatically transfers power from utility to generator during an outage. Transfer time < 10 seconds.
AOC (Active Optical Cable) — Fiber optic cable with active (powered) transceivers at each end. Used for InfiniBand runs > 3 meters.
B
BF16 (Brain Float 16) — 16-bit floating point format with 8 exponent bits and 7 mantissa bits. Preferred over FP16 for training due to wider dynamic range. Native to Tensor Cores from Ampere onward.
BlueField (BlueField-3) — NVIDIA’s DPU product line. BlueField-3 provides 400 GbE networking + Arm cores + hardware accelerators. Programmed via DOCA.
BMC (Baseboard Management Controller) — Dedicated microcontroller on server motherboard providing out-of-band management. Enables power control, console access, and hardware monitoring independent of the host OS.
C
CapEx (Capital Expenditure) — Upfront cost to purchase hardware/infrastructure. On-premises deployments are CapEx-heavy.
C2C (Chip-to-Chip) — NVLink variant for connecting CPU and GPU dies directly within a Superchip (e.g., Grace+H100 in GH200). Provides coherent memory access.
CRAC (Computer Room Air Conditioning) — Self-contained cooling unit for data centers. Practical limit ~30 kW/rack.
CRAH (Computer Room Air Handler) — Air handler connected to a building’s chilled water plant. More efficient than CRAC for large deployments.
CUDA (Compute Unified Device Architecture) — NVIDIA’s parallel computing platform and programming model. Foundation for all NVIDIA AI software.
cuDNN — NVIDIA’s library of GPU-accelerated deep learning primitives. Used by PyTorch, TensorFlow, JAX under the hood.
D
DAC (Direct Attach Copper) — Passive copper cable for short-reach InfiniBand or Ethernet connections (up to ~3 meters).
DCGM (Data Center GPU Manager) — NVIDIA’s GPU monitoring and diagnostics framework. Provides health monitoring, telemetry, active diagnostics, and policy management.
DCIM (Data Center Infrastructure Management) — Software for managing physical data center assets: power, cooling, space, and connectivity.
DCN (Data Center Networking) — General term for the network infrastructure within a data center.
DL (Deep Learning) — Subset of ML using neural networks with many layers. Foundation of modern AI (transformers, CNNs, diffusion models).
DLC (Direct Liquid Cooling) — Cooling method where coolant flows directly to cold plates on GPU/CPU. Required for racks > 40 kW.
DMA (Direct Memory Access) — A device accesses host memory directly without CPU involvement.
DOCA (Data Center Infrastructure on a Chip Architecture) — NVIDIA’s SDK for programming BlueField DPUs. Provides unified API across BlueField generations.
DPU (Data Processing Unit) — Third pillar of computing (alongside CPU and GPU). Offloads, accelerates, and isolates infrastructure tasks (networking, storage, security). NVIDIA’s product: BlueField.
E
E-W (East-West) — Traffic flowing between servers within the same cluster. In AI: GPU-to-GPU gradient exchange. Requires high bandwidth and low latency.
ECC (Error Correction Code) — Memory protection technology. Single-bit errors (SBE) are corrected; double-bit errors (DBE) are uncorrectable and require page retirement.
ECN (Explicit Congestion Notification) — Network mechanism where switches mark congested packets; receivers signal senders to slow down (used in DCQCN for RoCE congestion control).
F
FP4 / FP8 / FP16 / FP32 — Floating point precision formats. Lower precision = smaller memory footprint + higher throughput. FP4 introduced in Blackwell.
Fat-tree — Network topology where aggregate uplink bandwidth equals downlink bandwidth. Non-blocking: any node can communicate with any other at full line rate simultaneously.
FDR InfiniBand — 56 Gbps InfiniBand generation. Legacy.
G
GRES (Generic Resource Scheduling) — Slurm mechanism for tracking and allocating non-CPU resources including GPUs.
GPU (Graphics Processing Unit) — Massively parallel processor optimized for matrix/tensor operations. Primary compute engine for AI workloads.
GPU Operator (NVIDIA GPU Operator) — Kubernetes operator that automates installation of the full NVIDIA GPU software stack (drivers, CUDA, DCGM, device plugin, MIG manager).
GPUDirect RDMA — Technology enabling the network adapter to read/write GPU memory directly, bypassing CPU memory. Eliminates the GPU→CPU→NIC copy in distributed training.
H
HBM (High Bandwidth Memory) — Stacked DRAM mounted on the GPU package. Provides very high bandwidth (3–8 TB/s). Used in data center GPUs (H100, B200).
HDR InfiniBand — 200 Gbps InfiniBand generation. Previous standard for H100 clusters.
HPC (High-Performance Computing) — Computing for scientific simulation and research. Often runs alongside AI workloads on shared GPU clusters.
I
IB (InfiniBand) — High-throughput, low-latency networking technology with native RDMA support. Industry standard for HPC and AI Factory scale-out networks.
IBTA (InfiniBand Trade Association) — Organization that maintains the InfiniBand specification.
iDRAC (Integrated Dell Remote Access Controller) — Dell’s BMC implementation. Provides out-of-band management for Dell servers.
iLO (Integrated Lights-Out) — HPE’s BMC implementation.
IPMI (Intelligent Platform Management Interface) — Industry-standard protocol for communicating with BMCs. Version 2.0 is widely supported. Being superseded by Redfish.
J
JBOD (Just a Bunch of Disks) — Storage enclosure without RAID. Used for raw capacity.
K
K8s (Kubernetes) — Container orchestration platform. Used for AI inference serving and increasingly for training. GPU support via NVIDIA GPU Operator.
KV cache (Key-Value cache) — In transformer inference, stores previously computed key and value tensors for each layer and token. Memory-intensive at long context lengths.
L
LoRA (Low-Rank Adaptation) — Parameter-efficient fine-tuning technique. Trains only a small set of adapter weights, leaving most model parameters frozen. Dramatically reduces fine-tuning compute cost.
M
MFU (Model FLOP Utilization) — Ratio of actual compute operations to theoretical peak GPU FLOPS. Healthy training runs achieve 40–60% MFU.
MIG (Multi-Instance GPU) — Hardware GPU partitioning technology (Ampere/Hopper). Divides one GPU into up to 7 fully isolated instances with dedicated SM, HBM, L2, and bandwidth.
ML (Machine Learning) — Subset of AI where systems learn from data without explicit programming.
MLOps — Application of DevOps principles to machine learning — automating the lifecycle of training, validation, deployment, and monitoring of ML models.
MMR (Meet-Me Room) — Physical location in a data center where carrier fiber connects to the building’s network.
N
N-S (North-South) — Traffic flowing between the data center and users/internet. In AI: management traffic, user access, result retrieval. Standard Ethernet.
NCCL (NVIDIA Collective Communications Library) — Library for multi-GPU and multi-node collective operations (all-reduce, all-gather, broadcast). Essential for distributed training.
NDR InfiniBand — 400 Gbps InfiniBand generation. Current standard for H100/B200 clusters.
NGC (NVIDIA GPU Cloud) — NVIDIA’s catalog of GPU-optimized containers, pre-trained models, and SDKs. Free to access.
NVLink — NVIDIA’s high-speed GPU interconnect. NVLink 4 (Hopper): 900 GB/s; NVLink 5 (Blackwell): 1.8 TB/s.
NVMe (Non-Volatile Memory Express) — PCIe-based SSD protocol. Very high IOPS and bandwidth vs SATA/SAS.
NVMe-oF (NVMe over Fabrics) — Extension of NVMe protocol over network fabrics (RDMA/TCP). BlueField-3 DPU offloads NVMe-oF processing.
NVSwitch — NVIDIA switch chip that creates fully non-blocking all-to-all NVLink fabric within a system (e.g., DGX H100 uses 4× NVSwitch for 8-GPU all-to-all).
O
OAM (OCP Accelerator Module) — Server form factor used in hyperscaler platforms for GPU hosting.
OOB (Out-of-Band) — Management traffic on a dedicated separate network, independent of production traffic.
OpEx (Operating Expenditure) — Ongoing costs (power, maintenance, licensing, cloud subscription). Cloud deployments are OpEx-heavy.
P
PCIe (Peripheral Component Interconnect Express) — Standard interface for connecting GPUs, NICs, and NVMe drives to servers. PCIe Gen5: 128 GB/s x16; PCIe Gen6: 256 GB/s.
PDU (Power Distribution Unit) — Rack-level power distribution. Intelligent PDUs monitor per-outlet power consumption.
PEFT (Parameter-Efficient Fine-Tuning) — Family of techniques for fine-tuning large models by training only a small subset of parameters. Includes LoRA, prefix tuning, adapters.
PFC (Priority Flow Control) — Ethernet mechanism that sends per-priority pause frames to prevent packet drops. Required for lossless Ethernet (RoCE).
PUE (Power Usage Effectiveness) — Data center efficiency metric: Total Facility Power / IT Equipment Power. Lower = better. Hyperscale targets 1.1–1.2.
R
RDMA (Remote Direct Memory Access) — Network transfer mechanism allowing one machine to directly read/write memory on another without involving the remote CPU or OS. Zero-copy, kernel-bypass.
Redfish — Modern REST/JSON-based API standard for server management (BMC). Supersedes IPMI.
RLHF (Reinforcement Learning from Human Feedback) — Technique for aligning LLMs to human preferences using a reward model trained on human rankings.
RoCE (RDMA over Converged Ethernet) — Takes RDMA semantics and runs them over Ethernet. Requires lossless Ethernet (PFC). Two versions: RoCEv1 (L2 only) and RoCEv2 (IP-routable).
S
SEL (System Event Log) — BMC log of hardware events (errors, power events, temperature alerts).
SM (Streaming Multiprocessor) — Basic compute unit in an NVIDIA GPU. Each SM contains CUDA cores, Tensor Cores, and a cache. Thousands of SMs work in parallel.
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) — NVIDIA InfiniBand feature that performs collective operations (all-reduce) inside network switches, reducing GPU data traffic.
Slurm — Open-source HPC job scheduler. Dominant in AI training clusters. Manages job queuing, resource allocation, and gang scheduling for multi-node jobs.
SXM — High-performance GPU slot form factor used in DGX systems. Supports full NVLink bandwidth and highest TDP. Not PCIe-compatible.
T
TDP (Thermal Design Power) — Maximum heat a component generates under sustained workload. Data center cooling and power circuits must handle TDP plus system overhead.
Tensor Core — Specialized matrix multiply-accumulate (GEMM) units within NVIDIA GPU SMs. Core accelerator for AI workloads. Each generation adds precision formats (FP4 in Blackwell).
TensorRT — NVIDIA SDK for deep learning inference optimization. Layer fusion, precision calibration (FP32→INT8), kernel auto-tuning.
TensorRT-LLM — NVIDIA’s LLM inference library. Adds paged KV cache, continuous batching, in-flight batching.
Triton (NVIDIA Triton Inference Server) — Open-source inference serving framework. Supports TensorRT, ONNX, PyTorch, TensorFlow backends. HTTP/gRPC API, dynamic batching.
TTFT (Time to First Token) — Inference latency from request receipt to first generated token. Critical metric for interactive LLM applications.
U
UPS (Uninterruptible Power Supply) — Battery-backed power supply providing bridge power during utility failure while generators start.
V
vGPU (Virtual GPU) — NVIDIA technology for sharing a GPU across multiple VMs via time-slicing. Requires NVIDIA AI Enterprise license. Supports vCS, vWS, vPC profiles.
VRAM (Video RAM) — GPU memory (also called framebuffer or GPU memory). In data center GPUs: HBM type. Must fit the model weights for training/inference.
X
XDR InfiniBand — 800 Gbps InfiniBand generation. Used in NVIDIA Quantum-X800 switches.
XID Error — NVIDIA GPU driver error code. XID 79 = GPU fell off the bus (hardware failure). XID 48 = double-bit ECC error.