1.4 Factors Behind Recent AI Improvements and Adoption
What the exam tests
Why AI has improved so dramatically in the last decade — the intersection of data, compute, and algorithm advances that made modern deep learning possible.
The three pillars of modern AI progress
Data × Compute × Algorithms
(labeled datasets, (GPU parallelism, (Transformers,
internet scale, Tensor Cores, attention, RLHF,
synthetic data) cloud AI infra) scaling laws)
│ │ │
└──────────────────────┴─────────────────────────┘
│
Modern AI capabilities
(LLMs, image generation, AlphaFold,
autonomous driving, drug discovery)
1. Data — the fuel
Scale
- Pre-2010: AI research used thousands to millions of labeled examples
- Today: LLMs train on trillions of tokens from internet text; vision models use billions of images
- ImageNet (2009): 1.2M labeled images that enabled the deep learning breakthrough
Data types
- Labeled data (supervised learning): requires human annotation — expensive, but necessary for high-accuracy classifiers
- Unlabeled/self-supervised data (language models, BERT, GPT): text is self-supervised — “predict the next word” doesn’t need human labels
- Synthetic data: AI-generated data for training (simulation, diffusion models generating training images)
Data velocity
Real-time data from IoT, logs, cameras, and user interactions means continuous model improvement and online learning opportunities.
2. Compute — the engine
GPU parallelism
The single biggest enabler. Deep learning is fundamentally matrix multiplication at massive scale — GPUs perform these operations 100–1000× faster than CPUs for typical DL workloads.
Tensor Core evolution
Each NVIDIA generation roughly doubles effective training throughput:
| Year | GPU | Peak Tensor Core TFLOPS (FP16) |
|---|---|---|
| 2017 | V100 | 125 |
| 2020 | A100 | 312 |
| 2022 | H100 | 989 |
| 2024 | B200 | ~2,250 |
Scaling laws (Chinchilla / GPT)
Empirically proven: model performance scales predictably with compute and dataset size. This gave researchers a roadmap — “add more compute + more data = better model” — which justified massive GPU investments.
Cloud AI infrastructure
AWS, Azure, GCP, and Oracle Cloud made GPU clusters accessible without upfront capital investment, lowering the barrier for organizations to experiment with large AI models.
3. Algorithms — the architecture
Transformer (2017 — “Attention Is All You Need”)
Before Transformers, RNNs processed sequences token-by-token — slow and struggled with long-range dependencies. The Transformer’s self-attention mechanism processes all tokens simultaneously and captures dependencies at any distance. This enabled:
- Parallelism during training (more GPU utilization)
- Models to scale to trillions of parameters
- GPT, BERT, LLaMA, Gemini, Claude, and every modern LLM
Transfer learning and fine-tuning
Pre-train once on massive data → fine-tune cheaply for specific tasks. This made large models economically practical: one training run produces a foundation model that powers hundreds of downstream applications.
RLHF (Reinforcement Learning from Human Feedback)
Technique for aligning LLMs to human preferences (ChatGPT’s breakthrough ingredient). Enables models to follow instructions, be helpful, and avoid harmful outputs.
PEFT / LoRA (Parameter-Efficient Fine-Tuning)
Fine-tune only a small fraction of parameters (adapters), dramatically reducing fine-tuning compute and storage cost. Made LLM customization accessible to organizations without massive GPU clusters.
4. Ecosystem maturity
| Factor | Impact |
|---|---|
| Open source frameworks (PyTorch, TensorFlow, JAX) | Lowered development barrier; massive community |
| HuggingFace / model hubs | Pre-trained models freely available; reduced time-to-value |
| MLOps tooling (MLflow, Kubeflow, W&B) | Industrialized model lifecycle management |
| Cloud APIs (OpenAI, Anthropic, Google) | AI capabilities without infrastructure expertise |
| NVIDIA NGC | Optimized, certified containers and models ready to deploy |
Self-check questions
- What two resources do scaling laws say determine model quality?
- What architectural innovation in 2017 enabled the modern LLM era?
- Why is self-supervised learning important for large language models?
- What is RLHF and what problem does it solve?
- Name two factors that made GPU compute more accessible to enterprises.
Answers
1. Compute (FLOPS) and data (tokens/parameters) — Chinchilla scaling laws showed optimal ratios between these two.2. The Transformer architecture (self-attention mechanism) — processes all tokens in parallel and captures long-range dependencies efficiently.
3. Language modeling (predict next token) is self-supervised — it doesn't require expensive human labeling. The internet provides essentially unlimited unlabeled text, enabling training on trillion-token corpora.
4. Reinforcement Learning from Human Feedback. It aligns a pre-trained LLM's outputs to human preferences by training a reward model on human rankings, then using RL to optimize the LLM against that reward model.
5. Any two of: cloud GPU rentals (AWS/Azure/GCP), NVIDIA NGC containers, decreasing GPU cost per FLOP, pre-trained models via HuggingFace.