3.4 Virtualizing Accelerated Infrastructure

What the exam tests

The three GPU virtualization methods (MIG, vGPU, passthrough), their use cases, trade-offs, and which NVIDIA products support each.

Why virtualize GPUs?

A physical GPU is expensive ($10K–$30K). Without virtualization:

One GPU per user/workload — most users can’t fully utilize a GPU
No isolation between workloads on shared infrastructure
No quality-of-service (QoS) guarantees

GPU virtualization enables:

Multi-tenancy: Multiple users/VMs/containers share one GPU
Isolation: Workloads cannot interfere with each other
QoS: Guaranteed resource allocation per workload
Better utilization: Smaller jobs fill the GPU that a single large job would leave idle

Three virtualization methods

┌────────────────────────────────────────────────────────────────┐
│  Physical GPU (e.g., H100 80GB)                                 │
│                                                                  │
│  Method 1: GPU Passthrough                                      │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │          VM gets the ENTIRE GPU — no sharing             │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Method 2: vGPU (NVIDIA Virtual GPU)                            │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐   │
│  │  VM 1      │ │  VM 2      │ │  VM 3      │ │  VM 4      │   │
│  │ vGPU 20GB  │ │ vGPU 20GB  │ │ vGPU 20GB  │ │ vGPU 20GB  │   │
│  └────────────┘ └────────────┘ └────────────┘ └────────────┘   │
│  (time-sliced, shared GPU — software-based partitioning)        │
│                                                                  │
│  Method 3: MIG (Multi-Instance GPU)                             │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐          │
│  │  MIG 1   │ │  MIG 2   │ │  MIG 3   │ │  MIG 4   │  ...     │
│  │ 20GB HBM │ │ 20GB HBM │ │ 10GB HBM │ │ 10GB HBM │          │
│  │ SM slice │ │ SM slice │ │ SM slice │ │ SM slice │          │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘          │
│  (hardware-partitioned — fully isolated, no interference)       │
└────────────────────────────────────────────────────────────────┘

Method 1: GPU Passthrough

What it is

A hypervisor (VMware ESXi, KVM, Hyper-V) assigns an entire physical GPU to a single VM using PCIe passthrough (also called SR-IOV VF passthrough or VFIO).

Characteristics

The VM has exclusive, dedicated access to the full GPU
Behaves identically to bare-metal GPU access inside the VM
No sharing — one GPU, one VM
Maximum performance — no virtualization overhead
GPU must be removed from host and allocated to VM

Use case

When a workload needs the full GPU and maximum performance
Single high-priority training job on a dedicated VM
Legacy applications that don’t support MIG or vGPU

Limitations

No sharing → poor utilization if the workload doesn’t use the GPU fully
No live migration of VMs (GPU state can’t be migrated)
Each physical GPU = one VM

Method 2: vGPU — NVIDIA Virtual GPU

What it is

NVIDIA vGPU is a software-based GPU virtualization technology where multiple VMs share a single physical GPU. The GPU is time-sliced — each VM gets a guaranteed time slice of GPU compute.

How it works

NVIDIA Virtual GPU Manager (vGPU Manager) runs as a plugin in the hypervisor
The physical GPU is exposed as multiple virtual GPU instances to VMs
Each VM installs a vGPU guest driver
GPU resources are shared via time-multiplexing (not spatial partitioning like MIG)

vGPU profiles (example with A100 80GB)

| Profile | Memory | Use case | |—|—|—| | A100-80C | 80 GB | Full GPU for compute (1 VM) | | A100-40C | 40 GB | 2 VMs sharing one GPU | | A100-20C | 20 GB | 4 VMs sharing one GPU | | A100-10C | 10 GB | 8 VMs sharing one GPU |

vGPU editions

Use case

VDI (Virtual Desktop Infrastructure) — multiple users sharing GPUs for graphics-accelerated virtual desktops
Multi-tenant AI inference — many users sharing inference capacity
Enterprise AI without dedicated per-user hardware

Limitations

Time-slicing: VMs take turns — potential latency spikes when many VMs compete
Memory isolation is soft (memory partitioned, but error isolation is not hardware-enforced like MIG)
Requires NVIDIA AI Enterprise license

Method 3: MIG — Multi-Instance GPU

What it is

MIG (Multi-Instance GPU) is a hardware-level GPU partitioning technology introduced with the Ampere (A100) architecture. The GPU is divided into up to 7 hardware-isolated GPU instances, each with its own:

Dedicated SM slice
Dedicated HBM memory partition
Dedicated L2 cache slice
Dedicated memory bandwidth
Dedicated PCIe bandwidth

Why hardware isolation matters

In vGPU time-slicing, a runaway process on one VM can cause latency spikes for all VMs (shared hardware state, context switches). In MIG, each instance has physically separate resources — one instance cannot affect the performance or memory of another.

Supported GPUs

A100 (SXM4 and PCIe): up to 7 MIG instances
H100 (SXM5 and PCIe): up to 7 MIG instances
A30: up to 4 MIG instances
H200: up to 7 MIG instances

MIG instance profiles (A100 80GB example)

Instance profile	SMs	GPU Memory	Number per GPU
1g.10gb	1/7	10 GB	7
2g.20gb	2/7	20 GB	3
3g.40gb	3/7	40 GB	2
4g.40gb	4/7	40 GB	1
7g.80gb	7/7	80 GB	1 (full GPU in MIG mode)

MIG and Kubernetes

With the NVIDIA GPU Operator and MIG Manager:

resources:
  requests:
    nvidia.com/mig-3g.40gb: "1"  # Request one 3g.40gb MIG instance

Each pod gets a guaranteed, isolated 40 GB partition.

Use case

Multi-tenant AI inference with strict QoS and isolation
Running multiple small models simultaneously with guaranteed throughput
Development environments where many data scientists share a production GPU cluster
Research clusters where a single GPU can serve multiple simultaneous experiments

Comparison matrix

Dimension	Passthrough	vGPU	MIG
Isolation type	Full GPU, hardware	Time-sliced, software	Hardware-partitioned
Number of tenants	1 per GPU	Up to 8–16 per GPU	Up to 7 per GPU (A100/H100)
Performance isolation	N/A (dedicated)	Soft (time share)	Hard (hardware)
Memory isolation	Full	Partitioned (soft)	Partitioned (hard)
Live VM migration	No	Yes (with some constraints)	No
Supported on	Any NVIDIA GPU	NVIDIA AI Enterprise GPUs	A100, H100, A30, H200
License required	No	NVIDIA AI Enterprise	No (built into supported GPUs)
Best for	Max performance, 1 tenant	VDI, soft multi-tenancy	Production multi-tenancy, QoS

Decision guide

Is GPU isolation/QoS critical (regulated, SLA-bound)?
    YES → Use MIG (if GPU supports it) or passthrough (1 tenant)
    
Does GPU support MIG (A100/H100)?
    YES → Use MIG for multi-tenancy with hardware isolation
    NO  → Use vGPU for soft multi-tenancy
    
Do you need full GPU performance for one workload?
    YES → Passthrough (or 7g.80gb MIG profile)
    
VDI / multiple Windows desktop users?
    → vGPU (vPC or vWS profile)

Self-check questions

What is the maximum number of MIG instances on an A100 or H100 GPU?
What is the key difference between MIG isolation and vGPU isolation?
Which virtualization method requires no special NVIDIA license?
A data science team of 4 users needs to share one H100, each with guaranteed 20 GB memory. Which method?
Which GPU form factor (passthrough, vGPU, MIG) supports live VM migration?

Answers

1. Up to 7 MIG instances (with the 1g.10gb profile on an A100/H100 80GB).
2. MIG uses hardware partitioning — each instance has physically dedicated SMs, HBM memory, L2 cache, and bandwidth. One instance truly cannot affect another. vGPU uses software time-slicing — instances share the hardware state in turns; a misbehaving instance can cause latency spikes for others.
3. GPU passthrough (and MIG) — both use drivers that come with NVIDIA's standard software. vGPU requires NVIDIA AI Enterprise licensing.
4. MIG — configure 4× 2g.20gb instances on the H100. Each user gets a guaranteed, hardware-isolated 20 GB partition with no interference between users.
5. vGPU (with NVIDIA Virtual GPU Manager on supported hypervisors) supports live VM migration with some constraints. Passthrough and MIG do not support live migration because the GPU state is bound to a specific physical location.