25
Python + notebooks + environments
3 subtopics
26
NumPy arrays and vectorization basics
27
pandas dataframes: joins, groupby, missing values
28
Reproducible environments (venv/conda, requirements, seeds)
29
Data understanding and preparation
5 subtopics
30
Train/validation/test splits and leakage prevention
31
Feature scaling and normalization (when/why)
32
Categorical encoding (one-hot, target encoding caveats)
33
Handling missing data and outliers (robust approaches)
34
Feature engineering mindset (baseline-first)
35
Problem framing and baselines
4 subtopics
36
Choose task type: regression vs classification vs ranking
37
Define success metrics and constraints (latency, cost, fairness)
38
Create a simple baseline model (and beat it)
39
Error analysis loop (slice-by-slice)
40
Model evaluation essentials
4 subtopics
41
Classification metrics: precision/recall/F1, ROC-AUC, PR-AUC
42
Regression metrics: MAE, RMSE, R² (and when each misleads)
43
Calibration and decision thresholds
44
Statistical significance for model comparisons (practical)
45
Experiment tracking and versioning
3 subtopics
46
Track data/model/code versions (what to record)
47
Use an experiment tracker (e.g., MLflow/W&B) effectively
48
Write a clean training script (config-driven)
49
Practical optimization & regularization tools
6 subtopics
50
L1/L2 regularization and weight decay
51
Early stopping and checkpoints
52
Learning rate schedules (step, cosine, warmup)
53
Class imbalance handling (weights, sampling, focal loss idea)
54
Hyperparameter search (random, Bayesian) basics
55
Debugging training: sanity checks and failure modes