Domain 2 — AI Infrastructure (40%)
The heaviest domain — covers everything you need to spec, build, connect, and store for an AI data center. Rich in official NVIDIA course screenshots.
Subdomains
| # | Topic | File |
|---|---|---|
| 2.1 | Hardware requirements for specific AI training use cases | hardware-requirements |
| 2.2 | Scaling GPU infrastructure | gpu-scaling |
| 2.3 | Power and cooling requirements | power-and-cooling |
| 2.4 | On-prem vs cloud | onprem-vs-cloud |
| 2.5 | Cluster components of accelerated infrastructure | cluster-components |
| 2.6 | Facility requirements | facility-requirements |
| 2.7 | Networking requirements for AI workloads | networking-requirements |
| 2.8 | Data center networking protocols and key concepts | networking-protocols |
| 2.9 | High-speed data center network options and use cases | high-speed-network-options |
| 2.10 | DPU purpose and benefits | dpu |
Study tip: 2.7–2.10 have the most screenshots and the most specific product names. Know the difference between AI Fabric (E-W) and control/user-access network (N-S). Know which hardware (BlueField-3 + Spectrum-4) enables RoCE adaptive routing and congestion control.
Table of contents
- 2.1 Hardware Requirements for AI Training
- 2.2 Scaling GPU Infrastructure
- 2.3 Power and Cooling Requirements
- 2.4 On-Premises vs Cloud
- 2.5 Cluster Components
- 2.6 Facility Requirements
- 2.7 Networking Requirements for AI Workloads
- 2.8 Data Center Networking Protocols
- 2.9 High-Speed Network Options
- 2.10 DPU Purpose and Benefits