1.3 AI vs Machine Learning vs Deep Learning
What the exam tests
The nested definitions of AI, ML, and DL — and knowing which technologies (neural networks, transformers, CNNs) fall into which category.
The three nested fields
┌─────────────────────────────────────────────────────────┐
│ Artificial Intelligence (AI) │
│ Any technique enabling machines to mimic human │
│ intelligence: reasoning, planning, perception, NLP │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Machine Learning (ML) │ │
│ │ Systems that learn from data without being │ │
│ │ explicitly programmed for each task │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ Deep Learning (DL) │ │ │
│ │ │ ML using neural networks with many layers │ │ │
│ │ │ (depth) — learns hierarchical features │ │ │
│ │ │ directly from raw data │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Definitions
Artificial Intelligence (AI)
The broadest category — any computer system that performs tasks that normally require human intelligence.
- Examples: rule-based expert systems, chess engines, speech recognition, image classification, language translation
- Does NOT require learning — a hard-coded decision tree qualifies as AI
- Does NOT require a neural network
Machine Learning (ML)
A subset of AI where the system learns from examples (data) rather than following explicit rules programmed by a human.
Three learning paradigms:
| Paradigm | How it works | Examples |
|---|---|---|
| Supervised | Learns from labeled (input, output) pairs | Image classification, regression, fraud detection |
| Unsupervised | Finds structure in unlabeled data | Clustering, anomaly detection, dimensionality reduction |
| Reinforcement | Agent learns by reward/penalty from environment | Game playing, robotic control, recommendation tuning |
Examples of ML algorithms (non-deep): Decision trees, random forests, gradient boosting (XGBoost), SVMs, k-means, PCA
Deep Learning (DL)
A subset of ML using artificial neural networks with many layers — the “depth” refers to the number of layers between input and output. More layers = the model learns increasingly abstract features.
Why deep learning dominates modern AI:
- Learns features automatically from raw data (no manual feature engineering)
- Scales with more data and compute in ways shallow ML cannot
- Enabled by GPUs (GPU parallelism makes training feasible) and large datasets
Key deep learning architectures
| Architecture | Full name | Primary use |
|---|---|---|
| CNN | Convolutional Neural Network | Image recognition, video, spatial data |
| RNN/LSTM | Recurrent NN / Long Short-Term Memory | Sequences (older approach, largely replaced) |
| Transformer | — | Language models, vision, multimodal (current dominant paradigm) |
| GAN | Generative Adversarial Network | Image synthesis, data augmentation |
| Diffusion Model | — | Image/video generation (Stable Diffusion, DALL-E) |
The Transformer architecture
The current foundation of virtually all large language models (GPT, LLaMA, Gemini) and many vision models (ViT). Key innovation: the attention mechanism — allows every token to attend to every other token in the sequence, capturing long-range dependencies.
Why Transformers need GPUs: The attention mechanism is an O(n²) matrix operation over sequence length n. For context windows of 128K tokens, this generates enormous matrix multiplications perfectly suited to GPU Tensor Cores.
Summary table
| AI | ML | DL | |
|---|---|---|---|
| Requires learning from data | No | Yes | Yes |
| Requires neural networks | No | No | Yes |
| Requires GPUs | No | Sometimes | Yes (at scale) |
| Feature engineering | Manual | Manual/Auto | Automatic |
| Example | Expert system | Random forest | GPT-4, ResNet |
Self-check questions
- Is a rule-based chatbot that uses if/else logic an example of AI, ML, or DL?
- What makes deep learning “deep”?
- Which neural network architecture currently dominates large language models?
- What type of ML does reinforcement learning fall under?
- Why did deep learning become practical only around 2012?
Answers
1. AI only — it follows explicit rules without learning from data.2. The number of layers (depth) between input and output. More layers enable learning increasingly abstract representations of data.
3. The Transformer architecture (with self-attention mechanism).
4. It is its own paradigm — separate from supervised and unsupervised learning. The agent learns through trial and error using reward signals from an environment.
5. Two factors converged: (1) GPUs became programmable for general compute (CUDA, 2007), making training feasible; (2) large labeled datasets became available (ImageNet, 2009). AlexNet (2012) demonstrated this combination conclusively.