Agentic AI — Study Path Agent

Agentic AI

149 topics across 7 chapters

Chapter 1

Foundations & mental models

LLM basics for agents (what matters in practice)

3 subtopics

Context windows, tokens, and truncation failure modes

Sampling basics (temperature/top-p) and determinism expectations

Structured vs free-form outputs (why agents need structure)

Decision-making under uncertainty (agent mindset)

3 subtopics

Heuristics, policies, and when to avoid “over-reasoning”

MDP/POMDP intuition for partial observability

Bandits & exploration vs exploitation (practical lens)

Human-in-the-loop UX for agents

3 subtopics

Approval flows (confirmations, two-person rules, break-glass)

Explanations & transparency (what the user needs to trust actions)

Feedback capture loops (labels, corrections, preferences)

Prompting & instruction design for controllable agents

4 subtopics

System prompts, constraints, and instruction hierarchy

Structured outputs: JSON schemas, validation, and repair prompts

Few-shot exemplars for tool usage and planning style

Prompt anti-patterns (leakage, ambiguity, brittle formatting)

Chapter 2

Agent loop & planning

The agent loop: sense → think → act → reflect

3 subtopics

Perception/input normalization (parsing, cleaning, grounding)

Actuation & side effects (designing safe actions)

Termination criteria and “done” detection

Planning methods

4 subtopics

ReAct-style interleaving of reasoning and acting

Tree/graph search planning (beam search, ToT intuition)

Constraint-based planning (tools, budgets, policies)

Plan execution, monitoring, and re-planning triggers

Memory & state management

4 subtopics

Short-term state (scratchpad/state object) vs chat history

Long-term memory stores (KV stores, vector DBs)

Retrieval / RAG for agents

3 subtopics

Indexing & chunking for retrieval quality

Query rewriting, multi-query, and decomposition for RAG

Grounding and source attribution (citations in agent outputs)

Summarization & compression strategies for long-running agents

Task decomposition & execution graphs

3 subtopics

Goal/requirements elicitation (clarifying questions)

Designing a subtask graph (dependencies, parallelism)

Mapping subtasks to tools/agents (capability-aware scheduling)

↗ Prompting & instruction design for controllable agents (see Chapter 1)

Chapter 3

Tool use & environment interaction

Tool calling (functions), schemas, and routing

4 subtopics

Function schema design (inputs/outputs, enums, constraints)

Tool selection & routing (rules, classifiers, LLM router)

Validation, retries, and repair strategies for tool calls

Side-effect safety (idempotency keys, dry-runs, confirmations)

↗ Retrieval / RAG for agents (see Chapter 2)

Web/GUI automation as actions

3 subtopics

Browser automation basics (navigation, forms, downloads)

Robust selectors & resilience (timeouts, retries, flaky UIs)

Constraints & ethics (anti-bot, terms, user consent)

Sandboxes, simulators & safe execution environments

2 subtopics

Deterministic sandboxing (resource limits, network egress control)

Test harnesses & simulators for agents

3 subtopics

Mock tools and deterministic fixtures

Synthetic users and scripted scenarios

Adversarial testing (jailbreaks, injections, worst-case tasks)

Data/DB actions (transactions & permissions)

3 subtopics

Idempotency, transactions, and exactly-once illusions

Schema safety (migrations, forward/backward compatibility)

Permissioned queries (row-level security, least privilege)

Chapter 4

Architectures & design patterns

Single-agent vs multi-agent systems

3 subtopics

When multi-agent helps vs hurts (latency, accuracy, cost)

Communication protocols (messages, shared state, contracts)

Coordination failures (loops, deadlocks, collusion, drift)

Orchestrators & workflow engines

3 subtopics

Modeling workflows as DAGs (steps, retries, compensations)

Scheduling & queues (workers, priorities, rate limiting)

State persistence (checkpoints, resumability)

Reflection, critique & verification patterns

3 subtopics

Self-critique loops (review then revise)

Debate/consensus patterns (multiple proposals, arbitration)

Tool-assisted verification (unit checks, validators, theorem-ish)

Delegation, roles & handoffs

3 subtopics

Role prompting & responsibilities (contracts, boundaries)

Task handoff interfaces (artifacts, briefs, acceptance criteria)

Budgeting & quotas across sub-agents/tools

Event-driven agents (streams, triggers)

3 subtopics

Triggers & webhooks (reactive behaviors)

Streaming inputs (partial results, incremental decisions)

Backpressure and overload handling

Chapter 5

Reliability, safety & alignment

Guardrails & policy enforcement

3 subtopics

Output filtering & constrained generation (blocklists, allowlists)

Policy-as-code (rules engines, centralized governance)

Tool access control (capability tokens, scopes, approvals)

Error handling, retries & recovery

3 subtopics

Retry strategies (exponential backoff, jitter, circuit breakers)

Fallback modes (degraded responses, “safe answer only”)

Partial failure recovery (compensations, sagas)

Security for agents (prompt injection, secrets, tools)

3 subtopics

Prompt injection defenses (content isolation, instruction boundaries)

Secrets handling & exfiltration prevention (vaulting, redaction)

Tool supply-chain risks (untrusted plugins, dependency scanning)

Privacy & data governance

3 subtopics

PII detection/redaction and minimization

Data retention, deletion, and user export requests

Consent and user controls for agent actions

Human oversight & approvals (operational safety)

3 subtopics

↗ Approval flows (confirmations, two-person rules, break-glass) (see Chapter 1)

Escalation policies (when to page a human)

Audit trails for approvals (who/what/when/why)

Rate limits, latency & cost control

3 subtopics

Token/call budgeting per task (hard and soft limits)

Caching and memoization (tool results, retrieval, reasoning artifacts)

100

Batch vs real-time execution (latency-cost tradeoffs)

Chapter 6

Evaluation & observability

101

Agent evaluation methods (offline + online)

3 subtopics

102

Offline evaluation on task suites (replay, deterministic tests)

103

Online evaluation (A/B tests, shadow deployments, canaries)

104

Human evaluation & rubrics (calibration, inter-rater reliability)

↗ Test harnesses & simulators for agents (see Chapter 3)

105

Metrics: success, safety, latency, cost

4 subtopics

106

Defining “task success” (acceptance criteria, partial credit)

107

Latency SLOs and tail behavior (p95/p99)

108

Cost per task and budget burn-down monitoring

109

Safety metrics (policy violations, risky-tool attempts)

110

Tracing, logging & observability for agent runs

3 subtopics

111

Span tracing across steps and tool calls

112

Prompt/config/version logging for reproducibility

113

Redaction and access control for logs

114

Dataset curation & gold task suites

3 subtopics

115

Collecting real tasks (consent, sampling, representativeness)

116

Labeling guidelines and rubrics for agent outcomes

117

Splits and leakage prevention (train/val/test hygiene)

Chapter 7

Building & deployment (production systems)

118

System design for agentic applications

3 subtopics

119

Component architecture (model, tools, memory, orchestrator)

120

State management and resumability (checkpoints, idempotency keys)

121

Failure domains and blast radius (isolation, kill switches)

122

Production tool integrations

3 subtopics

123

Auth & credential management for tools (OAuth, service accounts)

124

Per-tool rate limits and adaptive throttling

125

Tool SLAs, timeouts, and fallbacks

126

Scalability & concurrency

3 subtopics

127

Parallel tool calls and dependency management

128

Queueing and worker pools (throughput engineering)

129

Concurrency control (locks, optimistic concurrency, dedupe)

130

Monitoring & incident response

3 subtopics

131

Alerting on agent metrics and anomaly detection

132

Runbooks and rollbacks (disable tools, degrade safely)

133

Postmortems and continuous improvement loops

134

CI/CD for prompts, tools, and agent configs

3 subtopics

135

Prompt diffing and regression tests in CI

136

Feature flags for agent behaviors and tools

137

Model/version pinning and compatibility testing

138

Compliance & auditing

3 subtopics

139

Compliance-ready logging (retention, immutability where needed)

140

Access reviews and least-privilege operations

141

Documentation (risk assessments, model cards, SOPs)