Study Path Agent Study Path Agent
Generate Your Own
Machine Learning
150 topics across 7 chapters
Chapter 1
Math foundations for ML
1
Linear algebra essentials
4 subtopics
2
Vectors, matrices, and shapes (dimensions)
3
Matrix multiplication, transpose, inverse (intuition + mechanics)
4
Dot products, projections, cosine similarity
5
Eigenvalues/eigenvectors and PCA intuition
6
Probability foundations
4 subtopics
7
Random variables, expectation, variance
8
Common distributions (Bernoulli, Binomial, Normal, Poisson)
9
Conditional probability and Bayes’ rule
10
Independence, covariance, correlation
11
Statistics essentials
4 subtopics
12
Sampling, estimators, bias vs variance (statistical view)
13
Hypothesis testing and confidence intervals (practical intuition)
14
Maximum likelihood estimation (MLE) and MAP
15
Overfitting, generalization, and cross-validation (statistical lens)
16
Calculus for optimization
4 subtopics
17
Derivatives and gradients (single + multivariate)
18
Chain rule and backpropagation intuition
19
Convexity (why some problems are easier)
20
Gradient descent and learning-rate behavior
21
Information theory basics (useful for ML)
3 subtopics
22
Entropy and cross-entropy (classification loss intuition)
23
KL divergence and why it shows up in ML
24
Mutual information (feature relevance intuition)
Chapter 2
Data, tooling, and ML workflow
25
Python + notebooks + environments
3 subtopics
26
NumPy arrays and vectorization basics
27
pandas dataframes: joins, groupby, missing values
28
Reproducible environments (venv/conda, requirements, seeds)
29
Data understanding and preparation
5 subtopics
30
Train/validation/test splits and leakage prevention
31
Feature scaling and normalization (when/why)
32
Categorical encoding (one-hot, target encoding caveats)
33
Handling missing data and outliers (robust approaches)
34
Feature engineering mindset (baseline-first)
35
Problem framing and baselines
4 subtopics
36
Choose task type: regression vs classification vs ranking
37
Define success metrics and constraints (latency, cost, fairness)
38
Create a simple baseline model (and beat it)
39
Error analysis loop (slice-by-slice)
40
Model evaluation essentials
4 subtopics
41
Classification metrics: precision/recall/F1, ROC-AUC, PR-AUC
42
Regression metrics: MAE, RMSE, R² (and when each misleads)
43
Calibration and decision thresholds
44
Statistical significance for model comparisons (practical)
45
Experiment tracking and versioning
3 subtopics
46
Track data/model/code versions (what to record)
47
Use an experiment tracker (e.g., MLflow/W&B) effectively
48
Write a clean training script (config-driven)
49
Practical optimization & regularization tools
6 subtopics
50
L1/L2 regularization and weight decay
51
Early stopping and checkpoints
52
Learning rate schedules (step, cosine, warmup)
53
Class imbalance handling (weights, sampling, focal loss idea)
54
Hyperparameter search (random, Bayesian) basics
55
Debugging training: sanity checks and failure modes
Chapter 3
Supervised learning core
56
Linear models
3 subtopics
57
Linear regression (least squares, regularized variants)
58
Logistic regression (decision boundary + loss)
59
Interpretability for linear models (coefficients, odds ratios)
60
k-Nearest Neighbors (kNN)
1 subtopics
61
Efficient search intuition (KD-trees, approximate NN concept)
62
Decision trees and ensembles
3 subtopics
63
Random forests: bagging and feature subsampling
64
XGBoost/LightGBM/CatBoost (when to choose which)
65
Feature importance and SHAP basics (interpretability)
66
Support Vector Machines (SVM)
3 subtopics
67
Margins and hinge loss intuition
68
Kernel trick conceptually (RBF, polynomial)
69
When SVMs work well (small/medium data) and pitfalls
70
Neural networks basics (MLP)
2 subtopics
71
Perceptron to multilayer networks (what layers do)
72
Activations (ReLU, sigmoid, tanh) and saturation issues
73
Model selection & bias-variance tradeoffs
2 subtopics
74
Learning curves (diagnose under/overfitting)
75
Ensembling strategies (stacking/blending basics)
Chapter 4
Unsupervised learning
76
Clustering basics
3 subtopics
77
Hierarchical clustering (linkage + dendrogram reading)
78
DBSCAN/HDBSCAN: density-based clustering and parameters
79
Cluster evaluation (silhouette, stability) and caveats
80
Dimensionality reduction
2 subtopics
81
t-SNE and UMAP: visualization vs modeling (pitfalls)
82
Manifold hypothesis (why non-linear methods can help)
83
Anomaly and novelty detection
1 subtopics
84
Z-score/IQR and robust statistics baselines
85
Topic modeling and representations
2 subtopics
86
Bag-of-words, TF-IDF, and sparse vectors
87
Latent Dirichlet Allocation (LDA) intuition
88
Representation learning principles
3 subtopics
89
Inductive biases (why architectures matter)
90
Self-supervised learning idea (contrastive, masked prediction)
91
Transfer learning and fine-tuning (practical patterns)
Chapter 5
Deep learning for vision and language
92
Deep learning training stack
3 subtopics
93
GPU basics and batching (why it speeds training)
94
Data loaders, augmentation, and shuffling correctness
95
Mixed precision training and numerical stability basics
96
Computer vision basics (CNNs)
2 subtopics
97
Convolutions, padding, stride, receptive fields
98
Classic CNN architectures (LeNet→ResNet intuition)
99
Natural language processing basics
3 subtopics
100
Tokenization, vocabularies, subwords (BPE idea)
101
Transformers fundamentals (attention, positional encoding)
102
NLP evaluation (BLEU/ROUGE vs task metrics)
103
Generative modeling basics
1 subtopics
104
Diffusion models conceptually (denoising, guidance)
105
Practical fine-tuning and prompting
2 subtopics
106
Prompting patterns (zero/few-shot, chain-of-thought caution)
107
Parameter-efficient fine-tuning (LoRA/adapters) conceptually
108
Deep learning engineering practices
2 subtopics
109
Efficient training (profiling, bottlenecks, data pipeline)
110
Deployment-aware training (latency, quantization idea)
Chapter 6
ML engineering, deployment, and MLOps
111
Serving patterns and deployment basics
4 subtopics
112
Batch vs online inference (tradeoffs)
113
Model packaging (Docker basics)
114
REST/gRPC model serving concepts
115
Latency budgeting and performance testing basics
116
Pipelines and orchestration
3 subtopics
117
ETL/ELT concepts and feature pipelines
118
Orchestration tools (Airflow/Prefect/Kubeflow) overview
119
Backfills, idempotency, and pipeline testing basics
120
Feature stores and data management
2 subtopics
121
Feature store concepts (entities, feature views)
122
Data quality checks (schema, ranges, drift)
123
Monitoring, drift, and retraining
4 subtopics
124
Monitor inputs/outputs (data drift vs concept drift)
125
Detect performance decay (labels delayed scenarios)
126
Alerting and incident response runbooks for ML systems
127
Retraining triggers and safe rollout strategies
128
Model governance and compliance basics
2 subtopics
129
Model cards and documentation (what to include)
130
Privacy basics (PII, anonymization, differential privacy idea)
131
Testing and reliability for ML
4 subtopics
132
Unit tests for data transforms and feature logic
133
Model validation tests (golden sets, invariances)
134
Adversarial and robustness testing basics
135
Canary deployments and A/B tests for models
136
Scaling systems for ML
3 subtopics
137
Distributed training basics (data vs model parallel idea)
138
Compute/storage tradeoffs (caching, embedding indexes)
139
Cost estimation and capacity planning for ML workloads
Chapter 7
Responsible AI and ethics
140
Fairness basics and common definitions
141
Bias sources (data, labeling, measurement, feedback loops)
142
Human-in-the-loop systems and oversight