Study Path Agent Study Path Agent
Generate Your Own
Machine Learning
220 topics across 6 chapters
Chapter 1
Math & Data Foundations for ML
1
Linear Algebra Essentials
4 subtopics
2
Vectors, dot products, norms
3
Matrices, transpose, inverse, rank
4
Eigenvalues/eigenvectors & SVD intuition
5
Geometric view of projections & least squares
6
Calculus & Optimization Basics
4 subtopics
7
Derivatives, partial derivatives, gradients
8
Chain rule & backprop intuition
9
Gradient descent variants (SGD, momentum, Adam)
10
Convexity basics & why it matters
11
Probability & Statistics for ML
5 subtopics
12
Random variables, distributions, expectation, variance
13
Bayes rule & conditional independence
14
Sampling, CLT, confidence intervals (intuition)
15
Maximum likelihood & MAP estimation
16
Hypothesis testing & common pitfalls (p-hacking, multiple tests)
17
Programming for ML (Python ecosystem)
5 subtopics
18
Python, NumPy arrays, vectorization
19
Pandas for tabular data: joins, groupby, missing values
20
Plotting & diagnostics (matplotlib/seaborn)
21
scikit-learn API: estimators, pipelines, transformers
22
Basic software engineering: packaging, tests, typing, notebooks vs scripts
23
Data Preparation & Feature Engineering
6 subtopics
24
Data cleaning: duplicates, outliers, missingness mechanisms
25
Encoding categorical variables (one-hot, target, embeddings)
26
Feature scaling & normalization (standardization, robust scaling)
27
Feature generation: interactions, polynomials, datetime, text basics
28
Dimensionality & curse of dimensionality (practical signs)
29
Train-time transforms vs serving-time transforms (feature parity)
30
Experimentation Basics (reproducibility)
4 subtopics
31
Version control for data/code (Git + data versioning concepts)
32
Random seeds, determinism, and run logging
33
Notebook hygiene & experiment notes
34
Baseline-first mindset & ablation studies
Chapter 2
Core ML Concepts & Workflow
35
Problem Framing & ML Use-Cases
4 subtopics
36
Types of ML problems (classification, regression, ranking, forecasting)
37
Choosing ML vs rules vs analytics; ROI & constraints
38
Label definition and label noise
39
Data collection plan & measurement (instrumentation)
40
Train/Validation/Test & Data Leakage
4 subtopics
41
Holdout splits vs cross-validation (when to use which)
42
Stratification, grouping, and time-aware splits
43
Leakage patterns in features, labels, and preprocessing
44
Data shift: covariate shift, concept drift, label shift
45
Loss Functions & Objective Design
4 subtopics
46
Common losses: MSE, MAE, log loss, hinge
47
Bias-variance tradeoff (practical intuition)
48
Surrogate losses & why we optimize them
49
Cost-sensitive learning & custom objectives
50
Regularization & Generalization
5 subtopics
51
Overfitting/underfitting diagnostics (learning curves)
52
L1/L2 regularization and sparsity
53
Early stopping & checkpoints
54
Dropout, data augmentation (when applicable)
55
Ensembles and why they generalize better
56
Model Selection & Hyperparameter Tuning
4 subtopics
57
Search strategies (grid, random, Bayesian optimization)
58
Hyperparameters vs parameters; what to tune first
59
Cross-validation pitfalls & nested CV
60
Pipelines for tuning without leakage
61
Evaluation Metrics & Error Analysis
5 subtopics
62
Classification metrics: precision/recall/F1, ROC-AUC, PR-AUC
63
Regression metrics: RMSE/MAE/R2 and residual analysis
64
Calibration, thresholding, and decision curves
65
Error analysis: slicing, confusion matrix deep-dives
66
Uncertainty estimation basics (aleatoric vs epistemic)
Chapter 3
Supervised Learning
67
Linear Models (regression & classification)
4 subtopics
68
Ordinary least squares & gradient-based fitting
69
Logistic regression: odds, logits, decision boundary
70
Regularized linear models (ridge, lasso, elastic net)
Common losses: MSE, MAE, log loss, hinge (see Chapter 2)
71
Tree-Based Models
5 subtopics
72
Decision trees: splitting criteria, depth, pruning
73
Random forests: bagging, feature subsampling
74
Gradient boosting: XGBoost/LightGBM/CatBoost concepts
75
Feature importance & SHAP-style explanations (practical use)
76
Tuning boosted trees: learning rate, depth, subsampling
77
Support Vector Machines & Kernels
4 subtopics
78
Max-margin intuition & soft margin (C)
79
Kernel trick and common kernels
80
SVM for classification vs SVR for regression
81
When SVMs work well vs fail (scaling, interpretability)
82
Instance-Based & Probabilistic Baselines
4 subtopics
83
k-NN: distance metrics, scaling sensitivity
84
Naive Bayes (text-friendly baseline)
85
Linear discriminant analysis (LDA) intuition
86
Baseline models & sanity checks for supervised learning
87
Imbalanced Learning & Calibration
4 subtopics
88
Imbalanced strategies: class weights, resampling, focal loss (idea)
89
Precision-recall tradeoffs and selecting thresholds
90
Probability calibration (Platt scaling, isotonic regression)
91
Evaluation under imbalance (PR-AUC, costs, stratified CV)
92
Time Series Supervised Learning
4 subtopics
93
Time series basics: stationarity, trend/seasonality, autocorrelation
94
Feature engineering for forecasting (lags, rolling stats, calendar)
95
Backtesting & time series cross-validation
96
Forecast evaluation metrics (sMAPE, MASE) and pitfalls
97
Supervised Learning Practice Projects
6 subtopics
98
Project: Kaggle-style tabular classification with strong baseline + tuning
99
Project: Regression with noisy labels (robust loss + diagnostics)
100
Project: Interpretability report for a tree model (SHAP/feature importance)
101
Project: Imbalanced classification (calibration + threshold selection)
102
Project: Time series forecasting with proper backtesting
103
Project: Build an end-to-end scikit-learn Pipeline + model card
Chapter 4
Unsupervised, Self-Supervised & Representation Learning
104
Clustering
4 subtopics
105
k-means: objective, initialization, choosing k
106
Gaussian Mixture Models (EM) intuition
107
Hierarchical clustering and linkage choices
108
Cluster validation and interpretation (silhouette, stability)
109
Dimensionality Reduction
4 subtopics
110
PCA: variance maximization, whitening (intuition)
111
t-SNE/UMAP: visualization vs modeling cautions
112
Autoencoders as learned representations (basic idea)
113
Choosing dimensionality and avoiding information leakage
114
Anomaly & Novelty Detection
4 subtopics
115
Density-based anomaly detection (Gaussian, KDE) basics
116
Isolation Forest and one-class SVM (idea + tradeoffs)
117
Evaluation of anomalies without labels (proxies, human-in-the-loop)
118
Practical pitfalls: contamination, drift, seasonality
119
Recommender Systems Basics
4 subtopics
120
Collaborative filtering basics (user-item matrix)
121
Matrix factorization intuition and implicit feedback
122
Ranking metrics (NDCG, MAP) and offline evaluation
123
Cold start and hybrid recommenders (content + CF)
124
Self-Supervised Learning (core ideas)
4 subtopics
125
Pretext tasks & augmentations (contrastive, masked modeling)
126
Contrastive learning: positives/negatives, temperature
127
Representation evaluation: linear probe & transfer learning
128
Common failure modes: collapse, shortcuts, leakage
129
Unsupervised Learning Practice Projects
6 subtopics
130
Project: Customer segmentation with clustering + narrative insights
131
Project: Dimensionality reduction for visualization with proper interpretation
132
Project: Anomaly detection for system logs (evaluation plan included)
133
Project: Build a simple movie recommender with offline ranking evaluation
134
Project: Train an autoencoder for representations + downstream classifier
135
Project: Self-supervised pretraining on images or text + transfer to a task
Chapter 5
Deep Learning & Modern Architectures
136
Neural Network Fundamentals
5 subtopics
137
Perceptrons, activations, and universal approximation (intuition)
138
Backprop in practice: computation graphs & autograd
139
Initialization (Xavier/He) and why it matters
140
Batching, epochs, and gradient noise
Gradient descent variants (SGD, momentum, Adam) (see Chapter 1)
141
Training Deep Networks
6 subtopics
142
Normalization layers (BatchNorm, LayerNorm) and effects
143
Learning rate schedules and warmup
144
Overfitting controls for deep nets (augmentation, dropout, weight decay)
145
Mixed precision training (fp16/bf16) basics
146
Debugging training: exploding/vanishing gradients, NaNs, dead ReLUs
147
Generalization in deep learning (double descent, inductive bias) overview
148
Convolutional Neural Networks (Vision)
4 subtopics
149
Convolutions, padding/stride, receptive fields
150
Classic CNN blocks: pooling, residuals, depthwise separable convs
151
Vision data augmentation and transfer learning
152
Evaluation for vision (top-k, mAP) and common pitfalls
153
Sequence Models (RNNs, LSTMs)
4 subtopics
154
Sequence modeling basics: teacher forcing, exposure bias
155
LSTMs/GRUs: gates and long-term dependencies
156
Sequence-to-sequence and attention (pre-transformer view)
157
When to use RNNs vs Transformers in practice
158
Transformers (NLP & beyond)
5 subtopics
159
Self-attention, positional encodings, multi-head attention
160
Transformer training: masking, causal vs encoder-decoder objectives
161
Fine-tuning vs prompting vs adapters/LoRA (overview)
162
Tokenization basics and context windows
163
Evaluation for NLP (BLEU/ROUGE vs task metrics) + hallucination awareness
164
Generative Modeling
4 subtopics
165
Autoregressive models and likelihood-based generation
166
Variational autoencoders (VAE) intuition
167
GANs: generator/discriminator game and stability issues
168
Diffusion models basics (denoising, sampling) overview
169
Deep Learning Practice Projects
6 subtopics
170
Project: Image classifier with transfer learning + robust evaluation
171
Project: Train a small transformer for text classification
172
Project: Fine-tune a pretrained model and write an evaluation report
173
Project: Build an embedding search / semantic retrieval demo
174
Project: Train a simple diffusion or GAN on a toy dataset
175
Project: Reproduce a paper result on a small dataset (with ablations)
Chapter 6
ML Engineering, Deployment & Responsible AI
176
ML System Design & Data Pipelines
5 subtopics
177
Data ingestion and validation (schemas, checks)
178
Feature stores & offline/online consistency (concepts)
179
Batch vs streaming pipelines; latency and freshness tradeoffs
180
Training pipelines: orchestration and retries (concepts)
181
System design: SLAs/SLOs, fallbacks, and graceful degradation
182
Deployment, Inference & Monitoring
6 subtopics
183
Packaging models for serving (serialization, preprocessing)
184
Serving patterns: online, batch, edge; choosing the right one
185
Monitoring: data drift, concept drift, performance degradation
186
A/B testing and experimentation in production
187
Incident response for ML systems (rollbacks, guardrails)
188
Human-in-the-loop review and feedback loops
189
MLOps & Experiment Tracking
5 subtopics
190
Experiment tracking tools concepts (metrics, artifacts, lineage)
191
Model registry & lifecycle (staging, prod, rollback)
Version control for data/code (Git + data versioning concepts) (see Chapter 1)
192
CI/CD for ML (tests for data, features, models)
193
Reproducible training with containers (Docker basics for ML)
194
Performance, Scaling & Hardware Basics
4 subtopics
195
Compute basics: CPU vs GPU vs TPU; memory bandwidth
196
Profiling training/inference and finding bottlenecks
197
Distributed training basics (data/model parallelism concepts)
198
Inference optimization: batching, quantization, caching
199
Privacy & Security in ML
4 subtopics
200
Privacy basics: PII, de-identification limits, governance
201
Differential privacy (high-level) and tradeoffs
202
Adversarial examples and robustness overview
203
Secure ML pipelines: secrets, access control, supply chain risks
204
Fairness, Accountability & Transparency
4 subtopics
205
Bias sources: data, labels, measurement, objectives
206
Fairness metrics and tradeoffs (group vs individual)
207
Explainability: global vs local, pitfalls, stakeholder needs
208
Documentation: datasheets, model cards, and audit trails
209
ML Productization & Communication
4 subtopics
210
Communicating results: plots, baselines, and uncertainty
211
Writing a technical ML report (structure + reproducibility checklist)
212
Choosing deployment constraints: latency, cost, privacy, UX
213
Stakeholder alignment: success metrics, guardrails, and iteration plan