Agentic AI — Study Path Agent

Agentic AI

181 topics across 7 chapters

Chapter 1

Foundations & mental models

LLM basics for agents (what matters in practice)

3 subtopics

Context windows, tokens, and truncation failure modes

Sampling basics (temperature/top-p) and determinism expectations

Structured vs free-form outputs (why agents need structure)

Decision-making under uncertainty (agent mindset)

3 subtopics

Heuristics, policies, and when to avoid “over-reasoning”

MDP/POMDP intuition for partial observability

Bandits & exploration vs exploitation (practical lens)

Human-in-the-loop UX for agents

3 subtopics

Approval flows (confirmations, two-person rules, break-glass)

Explanations & transparency (what the user needs to trust actions)

Feedback capture loops (labels, corrections, preferences)

Prompting & instruction design for controllable agents

4 subtopics

System prompts, constraints, and instruction hierarchy

Structured outputs: JSON schemas, validation, and repair prompts

Few-shot exemplars for tool usage and planning style

Prompt anti-patterns (leakage, ambiguity, brittle formatting)

Chapter 2

Agent loop & planning

The agent loop: sense → think → act → reflect

3 subtopics

Perception/input normalization (parsing, cleaning, grounding)

Actuation & side effects (designing safe actions)

Termination criteria and “done” detection

Planning methods

7 subtopics

ReAct-style interleaving of reasoning and acting

Tree/graph search planning (beam search, ToT intuition)

Monte Carlo Tree Search (MCTS) planning for agents

Hierarchical task network (HTN) planning for agent workflows

Constraint-based planning (tools, budgets, policies)

Plan execution, monitoring, and re-planning triggers

Classical planning (PDDL-style), plan validators, and hybrid LLM+planner loops

Memory & state management

7 subtopics

Short-term state (scratchpad/state object) vs chat history

Long-term memory stores (KV stores, vector DBs)

Retrieval / RAG for agents

3 subtopics

Indexing & chunking for retrieval quality

Query rewriting, multi-query, and decomposition for RAG

Grounding and source attribution (citations in agent outputs)

Summarization & compression strategies for long-running agents

Learned retrieval policies for memory (when/what to write, retrieve, and cite)

Memory consolidation, decay, and forgetting policies (stability vs freshness)

Knowledge-graph memory (entity linking, relation extraction, graph RAG)

Task decomposition & execution graphs

3 subtopics

Goal/requirements elicitation (clarifying questions)

Designing a subtask graph (dependencies, parallelism)

Mapping subtasks to tools/agents (capability-aware scheduling)

↗ Prompting & instruction design for controllable agents (see Chapter 1)

Chapter 3

Tool use & environment interaction

Tool calling (functions), schemas, and routing

7 subtopics

Function schema design (inputs/outputs, enums, constraints)

Tool selection & routing (rules, classifiers, LLM router)

Validation, retries, and repair strategies for tool calls

Side-effect safety (idempotency keys, dry-runs, confirmations)

Code-as-policy: program synthesis for tool orchestration (generated workflows, typed wrappers)

Type-driven tool invocation and effect-aware schemas (capabilities, pre/post-conditions)

Tool-use provenance and interpretability (why this tool, why these args)

↗ Retrieval / RAG for agents (see Chapter 2)

Web/GUI automation as actions

3 subtopics

Browser automation basics (navigation, forms, downloads)

Robust selectors & resilience (timeouts, retries, flaky UIs)

Constraints & ethics (anti-bot, terms, user consent)

Sandboxes, simulators & safe execution environments

4 subtopics

Deterministic sandboxing (resource limits, network egress control)

Test harnesses & simulators for agents

3 subtopics

Mock tools and deterministic fixtures

Synthetic users and scripted scenarios

Adversarial testing (jailbreaks, injections, worst-case tasks)

Fuzzing and differential testing of agent actions inside sandboxes

Record/replay infrastructure for deterministic debugging of long agent runs

Data/DB actions (transactions & permissions)

3 subtopics

Idempotency, transactions, and exactly-once illusions

Schema safety (migrations, forward/backward compatibility)

Permissioned queries (row-level security, least privilege)

Chapter 4

Architectures & design patterns

Single-agent vs multi-agent systems

5 subtopics

When multi-agent helps vs hurts (latency, accuracy, cost)

Communication protocols (messages, shared state, contracts)

Coordination failures (loops, deadlocks, collusion, drift)

Market-based coordination (auctions, bidding, task pricing) in multi-agent systems

Incentives, credit assignment, and anti-collusion design for multi-agent teams

Orchestrators & workflow engines

5 subtopics

Modeling workflows as DAGs (steps, retries, compensations)

Scheduling & queues (workers, priorities, rate limiting)

State persistence (checkpoints, resumability)

Temporal workflows and SLAs (deadlines, time windows, timeouts as first-class)

Adaptive workflows: dynamic DAG compilation and runtime graph rewriting

Reflection, critique & verification patterns

6 subtopics

Self-critique loops (review then revise)

Debate/consensus patterns (multiple proposals, arbitration)

Tool-assisted verification (unit checks, validators, theorem-ish)

SMT/constraint-solving for verification and plan checking (specs, invariants, counterexamples)

Counterexample-guided refinement loops (CEGAR-like) for agent policies and prompts

Ensemble/self-consistency verification (multiple solutions + arbitration)

Delegation, roles & handoffs

3 subtopics

Role prompting & responsibilities (contracts, boundaries)

Task handoff interfaces (artifacts, briefs, acceptance criteria)

Budgeting & quotas across sub-agents/tools

Event-driven agents (streams, triggers)

3 subtopics

Triggers & webhooks (reactive behaviors)

Streaming inputs (partial results, incremental decisions)

Backpressure and overload handling

Chapter 5

Reliability, safety & alignment

Guardrails & policy enforcement

3 subtopics

Output filtering & constrained generation (blocklists, allowlists)

Policy-as-code (rules engines, centralized governance)

Tool access control (capability tokens, scopes, approvals)

100

Error handling, retries & recovery

3 subtopics

101

Retry strategies (exponential backoff, jitter, circuit breakers)

102

Fallback modes (degraded responses, “safe answer only”)

103

Partial failure recovery (compensations, sagas)

104

Security for agents (prompt injection, secrets, tools)

6 subtopics

105

Prompt injection defenses (content isolation, instruction boundaries)

106

Secrets handling & exfiltration prevention (vaulting, redaction)

107

Tool supply-chain risks (untrusted plugins, dependency scanning)

108

RAG-specific attacks: data poisoning, malicious documents, and retrieval-time prompt injection

109

Untrusted I/O isolation (tool output sanitization, content-type boundaries, parser hardening)

110

Secure multi-tenant agent execution (isolation, noisy neighbors, per-tenant policies)

111

Privacy & data governance

5 subtopics

112

PII detection/redaction and minimization

113

Data retention, deletion, and user export requests

114

Consent and user controls for agent actions

115

Differential privacy for agent telemetry/logs (privacy budgets, utility tradeoffs)

116

Federated/on-device agent constraints (local-first memory, secure enclaves, sync policies)

117

Human oversight & approvals (operational safety)

3 subtopics

↗ Approval flows (confirmations, two-person rules, break-glass) (see Chapter 1)

118

Escalation policies (when to page a human)

119

Audit trails for approvals (who/what/when/why)

120

Rate limits, latency & cost control

3 subtopics

121

Token/call budgeting per task (hard and soft limits)

122

Caching and memoization (tool results, retrieval, reasoning artifacts)

123

Batch vs real-time execution (latency-cost tradeoffs)

Chapter 6

Evaluation & observability

124

Agent evaluation methods (offline + online)

6 subtopics

125

Offline evaluation on task suites (replay, deterministic tests)

126

Online evaluation (A/B tests, shadow deployments, canaries)

127

Human evaluation & rubrics (calibration, inter-rater reliability)

128

Causal evaluation and counterfactual logging (bias, interference, uplift for agent changes)

129

Distribution shift and stress testing (OOD tasks, robustness curves, adversarial realism)

130

Cost-aware evaluation (Pareto frontiers across success/safety/latency/cost)

↗ Test harnesses & simulators for agents (see Chapter 3)

131

Metrics: success, safety, latency, cost

4 subtopics

132

Defining “task success” (acceptance criteria, partial credit)

133

Latency SLOs and tail behavior (p95/p99)

134

Cost per task and budget burn-down monitoring

135

Safety metrics (policy violations, risky-tool attempts)

136

Tracing, logging & observability for agent runs

6 subtopics

137

Span tracing across steps and tool calls

138

Prompt/config/version logging for reproducibility

139

Redaction and access control for logs

140

OpenTelemetry-style instrumentation patterns for multi-step agent traces

141

Trace-based debugging at scale (failure clustering, root-cause mining, exemplar traces)

142

Privacy-preserving observability (selective logging, hashing, secure storage tiers)

143

Dataset curation & gold task suites

3 subtopics

144

Collecting real tasks (consent, sampling, representativeness)

145

Labeling guidelines and rubrics for agent outcomes

146

Splits and leakage prevention (train/val/test hygiene)

Chapter 7

Building & deployment (production systems)

147

System design for agentic applications

3 subtopics

148

Component architecture (model, tools, memory, orchestrator)

149

State management and resumability (checkpoints, idempotency keys)

150

Failure domains and blast radius (isolation, kill switches)

151

Production tool integrations

3 subtopics

152

Auth & credential management for tools (OAuth, service accounts)

153

Per-tool rate limits and adaptive throttling

154

Tool SLAs, timeouts, and fallbacks

155

Scalability & concurrency

6 subtopics

156

Parallel tool calls and dependency management

157

Queueing and worker pools (throughput engineering)

158

Concurrency control (locks, optimistic concurrency, dedupe)

159

Distributed state sharding and coordination for agents (consistent hashing, ownership, leases)

160

Backpressure-aware schedulers and admission control (queue-aware planning/execution)

161

Deterministic concurrency testing (linearizability checks, Jepsen-style fault injection)

162

Monitoring & incident response

3 subtopics

163

Alerting on agent metrics and anomaly detection

164

Runbooks and rollbacks (disable tools, degrade safely)

165

Postmortems and continuous improvement loops

166

CI/CD for prompts, tools, and agent configs

3 subtopics

167

Prompt diffing and regression tests in CI

168

Feature flags for agent behaviors and tools

169

Model/version pinning and compatibility testing

170

Compliance & auditing

3 subtopics

171

Compliance-ready logging (retention, immutability where needed)

172

Access reviews and least-privilege operations

173

Documentation (risk assessments, model cards, SOPs)