Study Path Agent Study Path Agent
Generate Your Own
Agentic AI
181 topics across 7 chapters
Chapter 1
Foundations & mental models
1
LLM basics for agents (what matters in practice)
3 subtopics
2
Context windows, tokens, and truncation failure modes
3
Sampling basics (temperature/top-p) and determinism expectations
4
Structured vs free-form outputs (why agents need structure)
5
Decision-making under uncertainty (agent mindset)
3 subtopics
6
Heuristics, policies, and when to avoid “over-reasoning”
7
MDP/POMDP intuition for partial observability
8
Bandits & exploration vs exploitation (practical lens)
9
Human-in-the-loop UX for agents
3 subtopics
10
Approval flows (confirmations, two-person rules, break-glass)
11
Explanations & transparency (what the user needs to trust actions)
12
Feedback capture loops (labels, corrections, preferences)
13
Prompting & instruction design for controllable agents
4 subtopics
14
System prompts, constraints, and instruction hierarchy
15
Structured outputs: JSON schemas, validation, and repair prompts
16
Few-shot exemplars for tool usage and planning style
17
Prompt anti-patterns (leakage, ambiguity, brittle formatting)
Chapter 2
Agent loop & planning
18
The agent loop: sense → think → act → reflect
3 subtopics
19
Perception/input normalization (parsing, cleaning, grounding)
20
Actuation & side effects (designing safe actions)
21
Termination criteria and “done” detection
22
Planning methods
7 subtopics
23
ReAct-style interleaving of reasoning and acting
24
Tree/graph search planning (beam search, ToT intuition)
25
Monte Carlo Tree Search (MCTS) planning for agents
26
Hierarchical task network (HTN) planning for agent workflows
27
Constraint-based planning (tools, budgets, policies)
28
Plan execution, monitoring, and re-planning triggers
29
Classical planning (PDDL-style), plan validators, and hybrid LLM+planner loops
30
Memory & state management
7 subtopics
31
Short-term state (scratchpad/state object) vs chat history
32
Long-term memory stores (KV stores, vector DBs)
33
Retrieval / RAG for agents
3 subtopics
34
Indexing & chunking for retrieval quality
35
Query rewriting, multi-query, and decomposition for RAG
36
Grounding and source attribution (citations in agent outputs)
37
Summarization & compression strategies for long-running agents
38
Learned retrieval policies for memory (when/what to write, retrieve, and cite)
39
Memory consolidation, decay, and forgetting policies (stability vs freshness)
40
Knowledge-graph memory (entity linking, relation extraction, graph RAG)
41
Task decomposition & execution graphs
3 subtopics
42
Goal/requirements elicitation (clarifying questions)
43
Designing a subtask graph (dependencies, parallelism)
44
Mapping subtasks to tools/agents (capability-aware scheduling)
Prompting & instruction design for controllable agents (see Chapter 1)
Chapter 3
Tool use & environment interaction
45
Tool calling (functions), schemas, and routing
7 subtopics
46
Function schema design (inputs/outputs, enums, constraints)
47
Tool selection & routing (rules, classifiers, LLM router)
48
Validation, retries, and repair strategies for tool calls
49
Side-effect safety (idempotency keys, dry-runs, confirmations)
50
Code-as-policy: program synthesis for tool orchestration (generated workflows, typed wrappers)
51
Type-driven tool invocation and effect-aware schemas (capabilities, pre/post-conditions)
52
Tool-use provenance and interpretability (why this tool, why these args)
Retrieval / RAG for agents (see Chapter 2)
53
Web/GUI automation as actions
3 subtopics
54
Browser automation basics (navigation, forms, downloads)
55
Robust selectors & resilience (timeouts, retries, flaky UIs)
56
Constraints & ethics (anti-bot, terms, user consent)
57
Sandboxes, simulators & safe execution environments
4 subtopics
58
Deterministic sandboxing (resource limits, network egress control)
59
Test harnesses & simulators for agents
3 subtopics
60
Mock tools and deterministic fixtures
61
Synthetic users and scripted scenarios
62
Adversarial testing (jailbreaks, injections, worst-case tasks)
63
Fuzzing and differential testing of agent actions inside sandboxes
64
Record/replay infrastructure for deterministic debugging of long agent runs
65
Data/DB actions (transactions & permissions)
3 subtopics
66
Idempotency, transactions, and exactly-once illusions
67
Schema safety (migrations, forward/backward compatibility)
68
Permissioned queries (row-level security, least privilege)
Chapter 4
Architectures & design patterns
69
Single-agent vs multi-agent systems
5 subtopics
70
When multi-agent helps vs hurts (latency, accuracy, cost)
71
Communication protocols (messages, shared state, contracts)
72
Coordination failures (loops, deadlocks, collusion, drift)
73
Market-based coordination (auctions, bidding, task pricing) in multi-agent systems
74
Incentives, credit assignment, and anti-collusion design for multi-agent teams
75
Orchestrators & workflow engines
5 subtopics
76
Modeling workflows as DAGs (steps, retries, compensations)
77
Scheduling & queues (workers, priorities, rate limiting)
78
State persistence (checkpoints, resumability)
79
Temporal workflows and SLAs (deadlines, time windows, timeouts as first-class)
80
Adaptive workflows: dynamic DAG compilation and runtime graph rewriting
81
Reflection, critique & verification patterns
6 subtopics
82
Self-critique loops (review then revise)
83
Debate/consensus patterns (multiple proposals, arbitration)
84
Tool-assisted verification (unit checks, validators, theorem-ish)
85
SMT/constraint-solving for verification and plan checking (specs, invariants, counterexamples)
86
Counterexample-guided refinement loops (CEGAR-like) for agent policies and prompts
87
Ensemble/self-consistency verification (multiple solutions + arbitration)
88
Delegation, roles & handoffs
3 subtopics
89
Role prompting & responsibilities (contracts, boundaries)
90
Task handoff interfaces (artifacts, briefs, acceptance criteria)
91
Budgeting & quotas across sub-agents/tools
92
Event-driven agents (streams, triggers)
3 subtopics
93
Triggers & webhooks (reactive behaviors)
94
Streaming inputs (partial results, incremental decisions)
95
Backpressure and overload handling
Chapter 5
Reliability, safety & alignment
96
Guardrails & policy enforcement
3 subtopics
97
Output filtering & constrained generation (blocklists, allowlists)
98
Policy-as-code (rules engines, centralized governance)
99
Tool access control (capability tokens, scopes, approvals)
100
Error handling, retries & recovery
3 subtopics
101
Retry strategies (exponential backoff, jitter, circuit breakers)
102
Fallback modes (degraded responses, “safe answer only”)
103
Partial failure recovery (compensations, sagas)
104
Security for agents (prompt injection, secrets, tools)
6 subtopics
105
Prompt injection defenses (content isolation, instruction boundaries)
106
Secrets handling & exfiltration prevention (vaulting, redaction)
107
Tool supply-chain risks (untrusted plugins, dependency scanning)
108
RAG-specific attacks: data poisoning, malicious documents, and retrieval-time prompt injection
109
Untrusted I/O isolation (tool output sanitization, content-type boundaries, parser hardening)
110
Secure multi-tenant agent execution (isolation, noisy neighbors, per-tenant policies)
111
Privacy & data governance
5 subtopics
112
PII detection/redaction and minimization
113
Data retention, deletion, and user export requests
114
Consent and user controls for agent actions
115
Differential privacy for agent telemetry/logs (privacy budgets, utility tradeoffs)
116
Federated/on-device agent constraints (local-first memory, secure enclaves, sync policies)
117
Human oversight & approvals (operational safety)
3 subtopics
Approval flows (confirmations, two-person rules, break-glass) (see Chapter 1)
118
Escalation policies (when to page a human)
119
Audit trails for approvals (who/what/when/why)
120
Rate limits, latency & cost control
3 subtopics
121
Token/call budgeting per task (hard and soft limits)
122
Caching and memoization (tool results, retrieval, reasoning artifacts)
123
Batch vs real-time execution (latency-cost tradeoffs)
Chapter 6
Evaluation & observability
124
Agent evaluation methods (offline + online)
6 subtopics
125
Offline evaluation on task suites (replay, deterministic tests)
126
Online evaluation (A/B tests, shadow deployments, canaries)
127
Human evaluation & rubrics (calibration, inter-rater reliability)
128
Causal evaluation and counterfactual logging (bias, interference, uplift for agent changes)
129
Distribution shift and stress testing (OOD tasks, robustness curves, adversarial realism)
130
Cost-aware evaluation (Pareto frontiers across success/safety/latency/cost)
Test harnesses & simulators for agents (see Chapter 3)
131
Metrics: success, safety, latency, cost
4 subtopics
132
Defining “task success” (acceptance criteria, partial credit)
133
Latency SLOs and tail behavior (p95/p99)
134
Cost per task and budget burn-down monitoring
135
Safety metrics (policy violations, risky-tool attempts)
136
Tracing, logging & observability for agent runs
6 subtopics
137
Span tracing across steps and tool calls
138
Prompt/config/version logging for reproducibility
139
Redaction and access control for logs
140
OpenTelemetry-style instrumentation patterns for multi-step agent traces
141
Trace-based debugging at scale (failure clustering, root-cause mining, exemplar traces)
142
Privacy-preserving observability (selective logging, hashing, secure storage tiers)
143
Dataset curation & gold task suites
3 subtopics
144
Collecting real tasks (consent, sampling, representativeness)
145
Labeling guidelines and rubrics for agent outcomes
146
Splits and leakage prevention (train/val/test hygiene)
Chapter 7
Building & deployment (production systems)
147
System design for agentic applications
3 subtopics
148
Component architecture (model, tools, memory, orchestrator)
149
State management and resumability (checkpoints, idempotency keys)
150
Failure domains and blast radius (isolation, kill switches)
151
Production tool integrations
3 subtopics
152
Auth & credential management for tools (OAuth, service accounts)
153
Per-tool rate limits and adaptive throttling
154
Tool SLAs, timeouts, and fallbacks
155
Scalability & concurrency
6 subtopics
156
Parallel tool calls and dependency management
157
Queueing and worker pools (throughput engineering)
158
Concurrency control (locks, optimistic concurrency, dedupe)
159
Distributed state sharding and coordination for agents (consistent hashing, ownership, leases)
160
Backpressure-aware schedulers and admission control (queue-aware planning/execution)
161
Deterministic concurrency testing (linearizability checks, Jepsen-style fault injection)
162
Monitoring & incident response
3 subtopics
163
Alerting on agent metrics and anomaly detection
164
Runbooks and rollbacks (disable tools, degrade safely)
165
Postmortems and continuous improvement loops
166
CI/CD for prompts, tools, and agent configs
3 subtopics
167
Prompt diffing and regression tests in CI
168
Feature flags for agent behaviors and tools
169
Model/version pinning and compatibility testing
170
Compliance & auditing
3 subtopics
171
Compliance-ready logging (retention, immutability where needed)
172
Access reviews and least-privilege operations
173
Documentation (risk assessments, model cards, SOPs)