1.7 AI Development and Deployment Lifecycle Software Components

What the exam tests

The stages of the AI/ML lifecycle, which tools operate at each stage, and NVIDIA’s specific products that support production ML workflows (MLOps).

The AI lifecycle

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  1. Data    │───▶│  2. Model   │───▶│  3. Model   │───▶│  4. Deploy  │
│  Collection │    │  Training   │    │  Validation │    │  & Serve    │
│  & Prep     │    │             │    │             │    │             │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
       │                  │                  │                  │
       ▼                  ▼                  ▼                  ▼
  Data pipelines     NeMo, PyTorch     Experiment         TensorRT +
  RAPIDS (GPU         Megatron-LM       tracking           Triton
  data prep)          NCCL (dist.)      MLflow             Kubernetes
                       DGX cluster      W&B                Base Command
                                                                │
                                                                ▼
                                                    ┌──────────────────┐
                                                    │  5. Monitor &    │
                                                    │  Iterate         │
                                                    │  DCGM, Grafana   │
                                                    │  model drift,    │
                                                    │  retraining      │
                                                    └──────────────────┘

Stage 1: Data collection and preparation

Key activities

Data ingestion from multiple sources (databases, object storage, real-time streams)
Cleaning, deduplication, normalization, augmentation
Labeling (for supervised learning) — manual or AI-assisted
Feature engineering (for traditional ML) or tokenization (for LLMs)

NVIDIA tools

RAPIDS (cuDF, cuML): GPU-accelerated data processing — pandas/scikit-learn workflows on GPU
DALI (Data Loading Library): GPU-accelerated data augmentation pipeline, removes CPU bottleneck in vision training

Storage requirements

Training datasets can reach petabyte scale
Requires parallel/distributed file systems (Lustre, GPFS, WEKA) for high-throughput reads
Hot storage (NVMe/SSD) for active training datasets; object storage (S3-compatible) for cold archive

Stage 2: Model training

Key activities

Hyperparameter selection (learning rate, batch size, architecture choices)
Distributed training across multiple GPUs/nodes
Checkpointing (saving model state periodically to recover from failures)
Experiment tracking

NVIDIA tools

NeMo: training and fine-tuning LLMs, ASR, TTS
Megatron-LM: core of NVIDIA’s large-scale transformer training
NCCL: collective communications for multi-GPU training
NVIDIA Base Command Platform: manages DGX clusters, job scheduling, monitoring during training

Infrastructure

DGX systems with NVSwitch (8 GPUs, all-to-all 900GB/s)
InfiniBand fabric for multi-node (NDR 400G or 800G/port)
Slurm or Kubernetes for job scheduling

Stage 3: Model validation and experimentation

Key activities

Evaluate model accuracy on held-out test set
Compare runs (experiment tracking)
Hyperparameter tuning (manual or automated — grid search, Bayesian)
Model interpretability / explainability

Common tools

MLflow: open-source experiment tracking, model registry, artifact management
Weights & Biases (W&B): experiment tracking, visualization, hyperparameter sweeps
NVIDIA Triton model analyzer: profiles models to find optimal batch sizes

Stage 4: Deployment and serving

Key activities

Model optimization for target hardware
Setting up inference serving infrastructure
A/B testing, canary deployments
SLA definition (latency p99, throughput target)

NVIDIA tools

TensorRT: optimize model for production GPU; FP16/INT8/FP8/FP4 quantization
TensorRT-LLM: specifically for LLM inference (paged KV cache, continuous batching)
Triton Inference Server: production serving; dynamic batching, multi-model, multi-backend
NVIDIA Fleet Command: manage inference deployments at edge/branch locations

Deployment targets

Cloud: Kubernetes pods with GPU nodes; autoscaling on KEDA/HPA based on queue depth
On-prem: DGX or NVIDIA-Certified Servers; Triton + Kubernetes or bare-metal
Edge: NVIDIA Jetson (embedded AI); Orin modules; Fleet Command for management

Stage 5: Monitoring and iteration

Key activities

Track model accuracy in production (detect data drift, concept drift)
Monitor infrastructure health (GPU utilization, temperature, memory)
Trigger retraining when model performance degrades
A/B test new model versions

NVIDIA tools

DCGM (Data Center GPU Manager): GPU health, utilization, error tracking
Prometheus + Grafana: time-series metrics, dashboards (DCGM Exporter bridges DCGM → Prometheus)
NVIDIA AI Enterprise: includes monitoring integrations and lifecycle management

MLOps pipeline summary

Data Prep → Training → Validation → Registry → Deploy → Monitor → (retrain)
RAPIDS      NeMo       MLflow       NGC         TensorRT   DCGM
DALI        Megatron   W&B          Triton      Triton     Grafana
            NCCL                    Base Cmd    Kubernetes  MLflow

Self-check questions

Which NVIDIA library handles GPU-to-GPU communication during distributed training?
What is the purpose of TensorRT in the deployment stage?
Which NVIDIA tool manages DGX cluster jobs and monitoring during training?
What type of storage is required for high-throughput training data access?
What is model drift and why does it trigger retraining?

Answers

1. NCCL (NVIDIA Collective Communications Library)
2. TensorRT optimizes a trained model for a specific NVIDIA GPU — it fuses layers, calibrates precision, and auto-tunes kernels to maximize throughput and minimize latency at inference time.
3. NVIDIA Base Command Platform
4. Parallel/distributed file systems (Lustre, GPFS, WEKA, WekaFS) with NVMe-backed storage — they provide the high IOPS and bandwidth needed to keep GPUs fed during training.
5. Model drift: the distribution of real-world input data shifts away from the training distribution, causing accuracy to degrade over time. When accuracy drops below an acceptable threshold, the model needs to be retrained on fresh data.