System Design with Machine Learning

122 topics across 7 chapters

Chapter 1

System Design Foundations (for ML systems)

Requirements, constraints, and trade-offs (latency, cost, accuracy, freshness)

Design docs and communication (diagrams, APIs, assumptions, failure modes)

Distributed Systems Basics

4 subtopics

Consistency models, CAP, and read/write trade-offs

Sharding, partitioning, replication, and rebalancing

Time, ordering, idempotency, retries, and deduplication

Fault tolerance patterns (timeouts, circuit breakers, bulkheads)

Storage Systems (choosing the right datastore)

4 subtopics

Relational modeling, indexes, transactions, and query planning basics

NoSQL patterns: key-value, document, wide-column (when and why)

Data lake vs warehouse concepts (batch analytics foundations)

Search and vector databases (ANN indexes, recall/latency trade-offs)

Messaging and Streaming Basics

4 subtopics

Queues vs pub/sub (work distribution vs fan-out)

Kafka-style concepts: partitions, consumer groups, offsets

Delivery semantics (at-most/at-least/exactly-once) and implications

Stream processing basics (windows, watermarks, late events)

Caching and Performance Primitives

4 subtopics

Caching patterns (cache-aside, read-through, write-through, write-back)

Cache invalidation and consistency strategies (TTL, stampede protection)

CDN and edge concepts (latency reduction, global distribution)

Rate limiting, load shedding, backpressure, and graceful degradation

API Design and Service Interfaces

4 subtopics

REST vs gRPC and contract-driven APIs

Pagination, filtering, batching, and async APIs for heavy workloads

Versioning and backward compatibility strategies

AuthN/AuthZ integration points (tokens, scopes, service identity)

Chapter 2

ML and Data Fundamentals (for system designers)

Problem framing: objective, constraints, and success metrics

Data quality, labeling, bias, and sampling pitfalls

Evaluation basics: train/val/test, leakage, confidence intervals

Feature engineering basics (numerical, categorical, text, embeddings)

Model families and when to use them

4 subtopics

Linear models, tree-based models, and calibration basics

Deep learning basics (training dynamics, overfitting, generalization)

Ranking and recommenders (two-tower, matrix factorization, learning-to-rank)

LLM basics: prompting vs fine-tuning, context limits, hallucinations

Offline vs online inference trade-offs (latency, freshness, cost)

Responsible AI basics (fairness, transparency, human-in-the-loop)

Chapter 3

Data and Feature Pipelines

Ingestion design (batch vs streaming, CDC, event modeling)

Data validation (schema checks, constraints, anomaly detection)

ETL/ELT orchestration and pipeline correctness

4 subtopics

Orchestrators and DAGs (scheduling, dependencies, retries)

Idempotent pipelines, backfills, and reproducible reruns

Handling late/dirty data and schema evolution safely

Cost optimization (partitioning, file sizes, incremental processing)

Data versioning and lineage (datasets, code, configs, artifacts)

Feature stores and feature management

4 subtopics

Feature definitions, ownership, reuse, and documentation

Training set generation and point-in-time joins

Point-in-time correctness (preventing training/serving skew)

Online feature serving (latency budgets, caching, TTLs)

Online/offline feature consistency and skew detection

Real-time joins and freshness (state stores, enrichment, SLAs)

Chapter 4

Training and Experimentation Infrastructure

Experiment tracking and reproducibility (metrics, configs, seeds, artifacts)

Training data preparation (sampling, weighting, class imbalance)

Distributed training fundamentals

4 subtopics

Data parallel vs model parallel vs pipeline parallel (mental models)

Checkpointing, restartability, and handling preemptions

Mixed precision and throughput bottlenecks (I/O vs compute)

Distributed training stack concepts (collectives, parameter servers)

Hyperparameter tuning and reliable comparisons

4 subtopics

Search strategies (grid, random, Bayesian) and when to use each

Early stopping and pruning without biasing results

Parallelization and scheduling (trial concurrency, quotas, fairness)

Preventing leakage and over-tuning on validation/test sets

Compute management (GPU pools, quotas, scheduling, spot/preemptible)

Model registry and artifact management (versions, metadata, lineage)

CI/CD for ML (tests, validation gates, reproducible builds)

Chapter 5

Model Serving and Online ML Systems

Serving architectures (online, async, batch, edge) and SLA design

4 subtopics

Synchronous vs asynchronous inference (queues, callbacks, polling)

Request/response schemas (inputs, outputs, errors, metadata, tracing)

Multi-model routing and traffic splitting (by tenant, region, cohort)

Fallbacks and graceful degradation (stale model, heuristic, cached)

Model packaging and dependency management (containers, runtimes, ABI)

Low-latency inference optimization

4 subtopics

Quantization, pruning, distillation (accuracy vs latency trade-offs)

Hardware selection (CPU/GPU/accelerators) and concurrency models

Caching inference outputs and embeddings safely (keying, TTL, privacy)

Warmup, model loading, memory management, and tail latency controls

Scaling inference (autoscaling, batching, pooling, multi-tenancy)

Experimentation and safe rollouts (A/B, canary, shadow, holdouts)

Retrieval + ranking system design (search/recs)

4 subtopics

Candidate generation (rules, ANN, two-stage architectures)

Online feature computation for ranking (budgets, caching, consistency)

↗ Search and vector databases (ANN indexes, recall/latency trade-offs) (see Chapter 1)

Re-ranking, diversity, and multi-objective optimization (business + user)

LLM serving patterns (RAG, tools, guardrails)

4 subtopics

RAG architecture (chunking, retrieval, reranking, citations, caching)

Tool/function calling (schemas, sandboxing, timeouts, determinism)

Guardrails and policy enforcement (filters, routing, refusals, PII rules)

LLM security and safety (prompt injection, data exfiltration, jailbreaks)

Chapter 6

Operations: Reliability, Monitoring, and MLOps

SLOs/SLIs for ML products (latency, availability, quality, freshness)

Observability foundations (logs, metrics, traces, correlation IDs)

Model monitoring and feedback loops

4 subtopics

Data drift vs concept drift (detection signals and limitations)

Ground truth collection (delayed labels, human review, weak labels)

Alerting design (thresholds, burn rates, noise reduction, on-call)

Mitigating feedback loops and unintended behavior (exploration, guardrails)

↗ Experimentation and safe rollouts (A/B, canary, shadow, holdouts) (see Chapter 5)

Incident response for ML systems (triage, rollback, data/model blame)

Cost management and capacity planning (GPU cost, caching, batching)

100

Case studies and system design interview practice (ML-focused)

4 subtopics

101

Design exercise: end-to-end recommender (retrieval, ranking, evaluation)

102

Design exercise: real-time fraud detection (streaming, latency, labels)

103

Design exercise: LLM customer support bot (RAG, safety, monitoring, cost)

104

Interview frameworks and pitfalls (assumptions, bottlenecks, measurement)

Chapter 7

Security, Privacy, and Governance for ML Systems

105

Threat modeling for ML systems (assets, adversaries, attack surfaces)

106

Privacy engineering and PII handling

4 subtopics

107

De-identification/anonymization basics and common failure modes

108

Data retention, deletion, and subject rights workflows

109

Secure data sharing (least privilege, clean rooms, aggregate reporting)

110

Differential privacy concepts (noise, privacy budget, utility trade-offs)

111

Access control and secrets (service identity, key management, rotation)

112

Compliance and auditability (logging, approvals, traceability, evidence)

113

ML-specific threats (data poisoning, evasion, model stealing, membership inference)

↗ LLM security and safety (prompt injection, data exfiltration, jailbreaks) (see Chapter 5)

114

Governance workflows (review gates, model cards, data approvals, change control)