AI Workflows — How LLM, RAG, Agent & Agentic AI Actually Work

How AI Actually Works

Visual workflows showing how LLM, RAG, AI Agent, and Agentic AI process a request from start to finish — with real-world scenarios that show when each approach shines.

LLM Workflow

Single-pass text generation — question in, answer out

What’s happening under the hood

A Large Language Model is the foundation of every other AI pattern on this page. When you type a question, the LLM tokenizes your input (breaks it into sub-word pieces), runs it through dozens of transformer layers that compute attention across the entire context, then predicts the most likely next token — one at a time — until it finishes its response.

There is no retrieval, no planning, no tool use. The model generates entirely from patterns it learned during training. This makes it fast and versatile, but also means it can confidently produce information that sounds right but is factually wrong (hallucination), and it has no access to information after its training cutoff date or to any private data.

1 · User Input

The human types a prompt — a question, instruction, or creative request. This is the only input the model receives.

Example: “Explain how photosynthesis works in simple terms”

2 · Tokenization

The text is split into tokens (sub-word units). The word “photosynthesis” might become [“photo”, “syn”, “thesis”]. Each token is mapped to a numerical ID the model can process.

Technical: BPE or SentencePiece tokenizer · Typical vocab size: 32K–128K tokens

3 · Transformer Processing

Tokens pass through the transformer’s attention layers. Each layer computes relationships between every token and every other token, building a rich understanding of context, meaning, and intent.

Technical: Self-attention + feed-forward networks · 32–128+ layers · Billions of parameters

4 · Token-by-Token Generation

The model predicts the most probable next token, appends it to the sequence, then predicts the next, and the next — auto-regressively building the full response one token at a time.

Technical: Softmax over vocabulary · Temperature/top-p sampling controls randomness

5 · Output Delivered

The completed text is returned to the user. The model has no memory of this interaction — the next conversation starts from scratch unless the full history is sent again.

Latency: ~0.5–5 seconds · Cost: Lowest (single API call)

RAG Workflow

Retrieve, then generate — grounding answers in real documents

What’s happening under the hood

Retrieval-Augmented Generation inserts a knowledge retrieval step before the LLM generates its answer. Instead of relying solely on what the model memorized during training, RAG searches your actual documents, finds the most relevant passages, and feeds them into the LLM’s prompt as context. The LLM then generates a response grounded in that real information.

This is the most effective way to reduce hallucination and give the model access to private, current, or domain-specific data without retraining. The trade-off: answer quality depends heavily on whether the retrieval step found the right documents. If it retrieves irrelevant chunks, the answer will be grounded in the wrong information.

Pre-step · Index Your Documents

Before any queries happen, your documents are prepared: split into chunks, converted to vector embeddings by an embedding model, and stored in a vector database. This only happens once (plus updates as documents change).

Tools: LangChain / LlamaIndex for chunking · OpenAI / Cohere embeddings · Pinecone / Chroma / Weaviate for storage

1 · User Query

The user asks a question. Unlike a plain LLM, this question will first be used to search for relevant information before any text is generated.

Example: “What is our company’s parental leave policy for contractors?”

2 · Embed the Query

The question is converted into a vector embedding using the same embedding model that was used to index the documents. This creates a mathematical representation of the question’s meaning.

Technical: Same embedding model as indexing · Output: high-dimensional vector (e.g., 1536 dimensions)

3 · Vector Search & Retrieval

The query vector is compared against all document vectors in the database using similarity search (typically cosine similarity). The top-K most relevant document chunks are retrieved. Advanced systems add re-ranking to further prioritize the best matches.

Technical: Approximate Nearest Neighbor search · Top-K (usually 3–10 chunks) · Optional re-ranking (Cohere Rerank, cross-encoder)

4 · Augment the Prompt

The retrieved document chunks are inserted into the LLM’s prompt as context, typically with an instruction like “Answer the question based only on the following context.” The original question is appended after the context.

Technical: Context window management · Chunk compression if context exceeds limits

5 · LLM Generates Grounded Response

The LLM generates its answer using the retrieved context. Because it’s working from actual documents rather than just training memory, the answer is more accurate and can include source citations pointing back to specific documents.

Latency: ~1–8 seconds total · Output: Answer + source references

6 · Deliver with Citations

The user receives an answer grounded in their actual documents, with references to exactly which sources informed the response. Trust is higher because the provenance is traceable.

Quality gate: If retrieval returns low-confidence matches, system can flag uncertainty instead of guessing

AI Agent Workflow

Plan, act, observe, reflect — an autonomous reasoning loop

What’s happening under the hood

An AI Agent wraps an LLM with autonomy. Instead of answering a question in one shot, the agent receives a goal, then enters a loop: it reasons about what to do next, selects and uses a tool (web search, code execution, API call, file access), observes the result, and decides whether the task is complete or needs more steps.

This is the key shift: the LLM is no longer just generating text — it’s making decisions and taking actions. The most common pattern is ReAct (Reason + Act), where the model alternates between thinking (“I need to search for the latest quarterly data”) and acting (calling a search tool). The agent can handle errors, retry with different approaches, and self-correct — capabilities no static LLM or RAG system has.

1 · Receive Goal

The user provides a high-level objective, not just a question. The agent needs to figure out how to accomplish it.

Example: “Research the top 5 competitors in our market, find their pricing, and create a comparison spreadsheet”

2 · Plan

The LLM breaks the goal into a sequence of steps. It identifies what tools it will need, what information it must gather, and in what order to proceed. The plan may be explicit (written out) or implicit (decided step-by-step).

Pattern: Chain-of-thought → Task decomposition → Tool selection

3 · Select Tool & Act

The agent chooses the most appropriate tool for the current step and executes it. This might be a web search, API call, code execution, database query, file read/write, or email send — whatever the task requires.

Tools available: Web search · Code interpreter · File system · APIs · Calculators · RAG retrieval · Email

4 · Observe Result

The tool returns a result. The agent reads and interprets the output — was the search helpful? Did the code run without errors? Is the data complete or does it need more?

Technical: Tool output injected back into LLM context as “observation”

5 · Reflect & Decide

The agent evaluates: Is the goal complete? Do I need more information? Did something fail that I need to retry? Should I adjust my approach? This self-reflection is what makes agents adaptive.

Pattern: ReAct (Reason + Act) · Self-critique · Error recovery logic

↻ Steps 3–5 repeat in a loop until the agent decides the goal is complete (or hits a maximum iteration limit)

6 · Deliver Final Output

Once the agent determines the task is complete, it assembles and delivers the final result — which might be a file, a report, a sent email, completed code, or a summary of actions taken.

Latency: 30 sec – several minutes · LLM calls: 5–50+ per task · Cost: 5–20× a single LLM call

Agentic AI Workflow

A coordinated team of specialized agents tackling complex projects

What’s happening under the hood

Agentic AI is the orchestration layer — multiple specialized AI agents working together like a team. Each agent has a defined role (researcher, writer, reviewer, coder), its own set of tools, and access to shared memory. An orchestrator agent (or manager) decomposes the overall goal into subtasks and assigns them to the right specialist.

The key advantage is specialization and parallel execution. A researcher agent can gather data while a writer agent starts drafting based on earlier findings. A reviewer agent checks quality. If something fails, the orchestrator can reassign the task to a different agent or have the original agent retry with new instructions. This mirrors how real human teams work — and can handle projects that would overwhelm any single agent.

1 · Complex Goal Received

A high-level objective that requires multiple distinct capabilities — research, analysis, writing, coding, review — arrives from the user or a triggering system.

Example: “Produce a full competitive analysis report: research 10 competitors, analyze their products, pricing and market positioning, write the report with charts, and prepare a slide deck”

2 · Orchestrator Decomposes & Plans

The manager/orchestrator agent breaks the goal into a task graph — identifying which subtasks exist, their dependencies, which can run in parallel, and which agent role is best suited for each.

Output: Task graph with assignments → Research Agent (gather data) → Analysis Agent (process data) → Writer Agent (draft report) → Chart Agent (create visuals) → Reviewer Agent (quality check)

3 · Agents Execute in Parallel / Sequence

Each specialist agent works on its assigned subtask using its own tools, LLM, and domain-specific instructions. Agents that don’t depend on each other can run simultaneously. Each agent follows its own internal Reason → Act → Observe loop.

Parallel: Research agents searching different competitors simultaneously
Sequential: Writer waits for research to finish before drafting

4 · Shared Memory & Communication

As agents complete subtasks, their outputs are written to shared memory — a common state store all agents can read. The research agent’s findings become available to the writer; the writer’s draft becomes available to the reviewer. Agents can also send messages to request clarification from each other.

Technical: Shared state store (Redis, SQLite, vector DB) · Message queues · Event-driven triggers

5 · Review & Validation

Dedicated reviewer agents evaluate the work of other agents — checking for accuracy, completeness, consistency, and quality. If an output doesn’t meet standards, it’s sent back to the responsible agent with specific feedback for revision.

Pattern: Peer review · Iterative debate · Consensus checking · Human-in-the-loop approval gates

↻ Steps 3–5 repeat: orchestrator monitors progress, reassigns failed tasks, triggers dependent tasks when prerequisites complete

6 · Assemble & Deliver

Once all subtasks pass review, the orchestrator assembles the final deliverable — combining outputs from all agents into a cohesive package. The user receives the completed project.

Latency: Minutes to hours · LLM calls: 50–500+ · Cost: 20–100× a single LLM call · Output: Full project deliverables

⧉

The Layer Map

How these four paradigms stack on top of each other

They’re not alternatives — they’re layers

The biggest misconception is that you have to choose one. In practice, these paradigms stack. An Agentic AI system contains multiple AI Agents. Each Agent uses an LLM as its reasoning engine. Many of those Agents use RAG to access knowledge bases. The LLM is the foundation everything else is built on.

Think of it like building construction: the LLM is the foundation (structural intelligence). RAG adds plumbing (connecting to external data sources). An Agent adds a worker who can use tools and make decisions. Agentic AI is the whole construction crew — multiple specialized workers coordinating to build the complete structure.

Layer 4
Orchestration

Agentic AI

Multi-agent orchestration. Decomposes complex goals across teams of specialized agents. Adds shared memory, inter-agent communication, parallel execution, review cycles, and project-level coordination.

CrewAI · AutoGen · LangGraph · MetaGPT · OpenAI Swarm

Layer 3
Autonomy

AI Agent

Wraps the LLM with agency: planning, tool use, observation loops, self-reflection, and error recovery. Transforms a text generator into an autonomous worker that can interact with the world.

ReAct · Function calling · MCP protocol · Code execution · API integration

Layer 2
Knowledge

RAG

Connects the LLM to external knowledge. Adds a retrieval step that searches vector databases, document stores, or live data sources to ground responses in real, current, verifiable information.

Pinecone · Chroma · Weaviate · FAISS · LlamaIndex · LangChain · Embedding models

Layer 1
Intelligence

LLM

The core reasoning engine. Understands language, generates text, follows instructions, and performs inference. Every layer above depends on this foundation. Without the LLM, nothing else works.

GPT-4 · Claude · Gemini · Llama · Mistral · Transformer architecture · Attention mechanism

Common real-world stacking patterns

LLM alone → ChatGPT answering general questions, GitHub Copilot suggesting code, a chatbot writing marketing copy. Fast, cheap, creative — but no access to your data and prone to hallucination.

LLM + RAG → Enterprise help desk that answers questions using your company’s internal documentation. Perplexity AI searching the web and citing sources. A legal research tool that retrieves relevant case law before generating analysis.

LLM + RAG + Agent → A coding assistant that reads your codebase (RAG), plans a fix (planning), writes the code (generation), runs the tests (tool use), and iterates until all tests pass (reflection loop). Claude with computer use is an example — it can browse, search, write files, and execute code.

LLM + RAG + Multi-Agent → A research team where one agent searches academic papers (RAG), another agent searches financial data (RAG + API tools), a writer agent synthesizes findings into a report, and a reviewer agent fact-checks everything before delivery. CrewAI and AutoGen enable this pattern.

Same Task, Four Approaches

See how each paradigm handles the same real-world scenario — and which is the right fit

📋

Scenario: “What’s our refund policy for enterprise clients?”

An employee needs an answer from internal company documentation

LLM

Generates a plausible-sounding answer from training data

The LLM has never seen your company’s internal documents. It will generate a generic refund policy that sounds reasonable but may be completely wrong for your specific company. There’s no way for it to know your actual terms.

⚠ Risk: Hallucinated policy details could lead to contractual issues

RAG ✓

Searches your policy documents and returns the exact answer with citations

RAG embeds the question, searches your vector database of company policies, retrieves the specific refund policy section, and generates a precise answer citing the exact document, section number, and effective date.

✅ Best fit — this is a knowledge retrieval task. RAG is built for exactly this.

Agent

Overkill — would work but adds unnecessary complexity and cost

An agent could search your docs, but for a single Q&A lookup, the planning and tool-selection overhead adds latency and cost without benefit. There are no multi-step actions needed here.

⚡ Works but wasteful — like hiring a project manager to answer one question

Agentic

Massively overkill — deploying a team for a single lookup

Spinning up multiple specialized agents for a simple document lookup would be like convening a board meeting to check a fact. Expensive, slow, and completely unnecessary.

🚫 Wrong tool for the job

📊

Scenario: “Research our top 5 competitors and create a pricing comparison spreadsheet”

A strategist needs multi-step research and file creation

LLM

Generates a comparison from training data — likely outdated and unverifiable

Can produce a comparison table based on what it learned during training, but pricing data changes constantly. The output might reflect last year’s pricing or entirely fabricated numbers. Cannot actually create a spreadsheet file.

⚠ Outdated data, no file creation, no ability to verify current pricing

RAG

Could answer if competitor pricing was already in your knowledge base

If you’ve already collected and indexed competitor pricing data, RAG can retrieve and present it. But it can’t go find new data, visit competitor websites, or create spreadsheet files. It only works with what’s already indexed.

⚠ Only works if the data is already collected — can’t gather new information

Agent ✓

Researches competitors, gathers pricing, and builds the spreadsheet autonomously

The agent plans the task: (1) search the web for each competitor’s pricing page, (2) extract pricing tiers and features, (3) organize the data into a structured comparison, (4) generate a spreadsheet file. It handles errors (page not found, data format changes) and iterates until complete.

✅ Best fit — multi-step task requiring research, tool use, and file creation

Agentic

Would work well but may be more than needed for 5 competitors

A multi-agent team could parallelize the research (one agent per competitor), but for just 5 competitors, the orchestration overhead likely doesn’t pay off. Better suited if the scope expanded to 50 competitors or added deeper analysis.

⚡ Viable if scope is large enough to justify the coordination overhead

🚀

Scenario: “Launch a full marketing campaign for our new product — research, copy, images, ads, and scheduling”

A marketing director needs a complete cross-functional campaign

LLM

Can draft individual pieces of copy but can’t execute anything

A plain LLM can brainstorm campaign ideas, write ad copy, suggest headlines — but it can’t research the market, generate images, schedule posts, or coordinate across channels. You’d need to manually prompt it dozens of times and handle all execution yourself.

⚠ Useful as a writing assistant within a larger manual workflow

RAG

Could inform the campaign with brand guidelines and past campaign data

RAG could retrieve your brand voice guidelines, past successful campaigns, audience research, and product specifications to ensure consistency — but it’s a knowledge source, not an executor. It can’t write, design, or schedule.

⚠ Valuable as a component within an agent system, not as the primary approach

Agent

Could handle pieces sequentially but would struggle with the full scope

A single agent could research the market, then write copy, then try to generate images — but it would work through everything sequentially. The task is broad enough that a single agent’s context window and focus would be stretched thin. Quality degrades on later steps.

⚠ Possible but slow, and quality drops as task complexity increases

Agentic ✓

Deploys a team: researcher, copywriter, designer, scheduler, reviewer

The orchestrator decomposes the campaign into parallel workstreams: a researcher agent analyzes the market and audience, a copywriter agent creates messaging (using RAG to reference brand guidelines), a designer agent generates visual assets, a scheduler agent plans the content calendar, and a reviewer agent checks everything for consistency and quality before launch.

✅ Best fit — complex, cross-functional project requiring multiple specialized skills working in coordination

The decision rule is simple

If you need a quick answer or creative text → LLM. It’s the fastest and cheapest option for anything that doesn’t require verified facts or actions.

If the answer must come from specific documents → RAG. Whenever accuracy matters and the information exists in a knowable corpus, retrieval-augmented generation is the right pattern.

If you need actions taken, not just text produced → AI Agent. Any task that requires multiple steps, tool use, web browsing, file creation, or API calls needs an agent.

If the project needs a team, not just one worker → Agentic AI. When the scope involves multiple distinct skill sets, parallel workstreams, and quality review — that’s when you bring in the multi-agent system.

And remember: in production, these stack together. The best agentic systems use RAG for knowledge retrieval, agents for autonomous execution, and the LLM as the brain powering everything.

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>AI Workflows — How LLM, RAG, Agent & Agentic AI Actually Work</title> <link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:ital,wght@0,300;0,400;0,500;0,600;0,700&family=IBM+Plex+Mono:wght@400;500;600&family=Outfit:wght@700;800;900&display=swap" rel="stylesheet"> <style> :root { --bg: #0c1117; --surface: #151b24; --surface2: #1a222d; --border: #252e3b; --border-light: #2d3848; --text: #c8d1dc; --text-bright: #e8edf4; --text-dim: #6b7a8d; --llm-color: #3b82f6; --llm-bg: rgba(59,130,246,0.08); --llm-border: rgba(59,130,246,0.25); --rag-color: #10b981; --rag-bg: rgba(16,185,129,0.08); --rag-border: rgba(16,185,129,0.25); --agent-color: #f59e0b; --agent-bg: rgba(245,158,11,0.08); --agent-border: rgba(245,158,11,0.25); --agentic-color: #ec4899; --agentic-bg: rgba(236,72,153,0.08); --agentic-border: rgba(236,72,153,0.25); --accent: #6366f1; } * { margin: 0; padding: 0; box-sizing: border-box; } body { background: var(--bg); color: var(--text); font-family: 'IBM Plex Sans', sans-serif; line-height: 1.65; min-height: 100vh; } /* ── HEADER ── */ .hero { text-align: center; padding: 60px 24px 40px; background: linear-gradient(180deg, #0f1620 0%, var(--bg) 100%); border-bottom: 1px solid var(--border); } .hero h1 { font-family: 'Outfit', sans-serif; font-size: clamp(28px, 4.5vw, 48px); font-weight: 900; color: var(--text-bright); letter-spacing: -1px; margin-bottom: 12px; } .hero h1 span { color: var(--accent); } .hero p { font-size: 16px; color: var(--text-dim); max-width: 680px; margin: 0 auto; } /* ── NAV TABS ── */ .nav { display: flex; justify-content: center; gap: 6px; padding: 20px 16px; position: sticky; top: 0; background: rgba(12,17,23,0.92); backdrop-filter: blur(12px); -webkit-backdrop-filter: blur(12px); z-index: 100; border-bottom: 1px solid var(--border); flex-wrap: wrap; } .nav button { font-family: 'IBM Plex Sans', sans-serif; font-size: 13px; font-weight: 600; padding: 8px 18px; border-radius: 8px; border: 1px solid var(--border); background: var(--surface); color: var(--text-dim); cursor: pointer; transition: all 0.2s; } .nav button:hover { border-color: var(--border-light); color: var(--text); } .nav button.active-llm { background: var(--llm-bg); border-color: var(--llm-border); color: var(--llm-color); } .nav button.active-rag { background: var(--rag-bg); border-color: var(--rag-border); color: var(--rag-color); } .nav button.active-agent { background: var(--agent-bg); border-color: var(--agent-border); color: var(--agent-color); } .nav button.active-agentic { background: var(--agentic-bg); border-color: var(--agentic-border); color: var(--agentic-color); } .nav button.active-layers { background: rgba(99,102,241,0.1); border-color: rgba(99,102,241,0.3); color: var(--accent); } .nav button.active-scenario { background: rgba(99,102,241,0.1); border-color: rgba(99,102,241,0.3); color: var(--accent); } /* ── SECTIONS ── */ .section { display: none; max-width: 1100px; margin: 0 auto; padding: 40px 24px 60px; } .section.active { display: block; } .section-head { display: flex; align-items: center; gap: 14px; margin-bottom: 28px; } .section-badge { display: inline-flex; align-items: center; justify-content: center; width: 44px; height: 44px; border-radius: 12px; font-family: 'Outfit', sans-serif; font-weight: 800; font-size: 18px; color: #fff; flex-shrink: 0; } .section-head h2 { font-family: 'Outfit', sans-serif; font-size: 28px; font-weight: 800; color: var(--text-bright); } .section-head p { font-size: 14px; color: var(--text-dim); margin-top: 2px; } /* ── FLOW DIAGRAM ── */ .flow { display: flex; flex-direction: column; gap: 0; margin: 32px 0; position: relative; } .flow-step { display: flex; align-items: stretch; gap: 0; opacity: 0; transform: translateY(16px); animation: fadeUp 0.4s ease forwards; } .flow-step:nth-child(1) { animation-delay: 0.1s; } .flow-step:nth-child(2) { animation-delay: 0.2s; } .flow-step:nth-child(3) { animation-delay: 0.3s; } .flow-step:nth-child(4) { animation-delay: 0.4s; } .flow-step:nth-child(5) { animation-delay: 0.5s; } .flow-step:nth-child(6) { animation-delay: 0.6s; } .flow-step:nth-child(7) { animation-delay: 0.7s; } .flow-step:nth-child(8) { animation-delay: 0.8s; } .flow-step:nth-child(9) { animation-delay: 0.9s; } @keyframes fadeUp { to { opacity: 1; transform: translateY(0); } } .flow-rail { width: 56px; display: flex; flex-direction: column; align-items: center; flex-shrink: 0; position: relative; } .flow-dot { width: 14px; height: 14px; border-radius: 50%; border: 2.5px solid; background: var(--bg); z-index: 2; flex-shrink: 0; margin-top: 18px; } .flow-line { width: 2px; flex: 1; min-height: 10px; } .flow-card { flex: 1; background: var(--surface); border: 1px solid var(--border); border-radius: 12px; padding: 18px 22px; margin: 6px 0; transition: border-color 0.2s, background 0.2s; } .flow-card:hover { border-color: var(--border-light); background: var(--surface2); } .flow-card h3 { font-family: 'IBM Plex Mono', monospace; font-size: 13px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 6px; } .flow-card p { font-size: 14px; color: var(--text); line-height: 1.6; } .flow-card .detail { margin-top: 8px; padding-top: 8px; border-top: 1px solid var(--border); font-size: 12.5px; color: var(--text-dim); font-family: 'IBM Plex Mono', monospace; } .flow-card .detail span { font-weight: 600; } /* Color themes for flow */ .flow-llm .flow-dot { border-color: var(--llm-color); } .flow-llm .flow-line { background: var(--llm-border); } .flow-llm .flow-card h3 { color: var(--llm-color); } .flow-rag .flow-dot { border-color: var(--rag-color); } .flow-rag .flow-line { background: var(--rag-border); } .flow-rag .flow-card h3 { color: var(--rag-color); } .flow-agent .flow-dot { border-color: var(--agent-color); } .flow-agent .flow-line { background: var(--agent-border); } .flow-agent .flow-card h3 { color: var(--agent-color); } .flow-agentic .flow-dot { border-color: var(--agentic-color); } .flow-agentic .flow-line { background: var(--agentic-border); } .flow-agentic .flow-card h3 { color: var(--agentic-color); } /* ── LAYER DIAGRAM ── */ .layers-stack { display: flex; flex-direction: column; gap: 0; margin: 32px 0; } .layer-row { display: flex; align-items: stretch; gap: 20px; opacity: 0; animation: fadeUp 0.4s ease forwards; } .layer-row:nth-child(1) { animation-delay: 0.15s; } .layer-row:nth-child(2) { animation-delay: 0.3s; } .layer-row:nth-child(3) { animation-delay: 0.45s; } .layer-row:nth-child(4) { animation-delay: 0.6s; } .layer-row:nth-child(5) { animation-delay: 0.75s; } .layer-label { width: 130px; flex-shrink: 0; display: flex; align-items: center; justify-content: flex-end; padding-right: 16px; font-family: 'IBM Plex Mono', monospace; font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 1px; color: var(--text-dim); } .layer-block { flex: 1; border-radius: 12px; padding: 20px 24px; border: 1.5px solid; margin: 4px 0; position: relative; } .layer-block h3 { font-family: 'Outfit', sans-serif; font-size: 18px; font-weight: 700; margin-bottom: 6px; } .layer-block p { font-size: 13.5px; line-height: 1.6; } .layer-block .layer-tech { margin-top: 10px; font-family: 'IBM Plex Mono', monospace; font-size: 11.5px; opacity: 0.7; } .layer-connector { width: 130px; flex-shrink: 0; } .layer-connector-line { width: 2px; height: 16px; margin: 0 auto; } .layer-4 { background: var(--agentic-bg); border-color: var(--agentic-border); } .layer-4 h3 { color: var(--agentic-color); } .layer-3 { background: var(--agent-bg); border-color: var(--agent-border); } .layer-3 h3 { color: var(--agent-color); } .layer-2 { background: var(--rag-bg); border-color: var(--rag-border); } .layer-2 h3 { color: var(--rag-color); } .layer-1 { background: var(--llm-bg); border-color: var(--llm-border); } .layer-1 h3 { color: var(--llm-color); } .connector-bar { display: flex; justify-content: center; padding: 0; } .connector-bar .cbar { width: 2px; height: 20px; background: var(--border-light); } .connector-bar .clabel { display: none; } /* ── SCENARIO WALKTHROUGHS ── */ .scenario-grid { display: grid; grid-template-columns: 1fr; gap: 28px; margin-top: 28px; } .scenario-card { background: var(--surface); border: 1px solid var(--border); border-radius: 14px; overflow: hidden; opacity: 0; animation: fadeUp 0.4s ease forwards; } .scenario-card:nth-child(1) { animation-delay: 0.1s; } .scenario-card:nth-child(2) { animation-delay: 0.2s; } .scenario-card:nth-child(3) { animation-delay: 0.3s; } .scenario-header { padding: 20px 24px; border-bottom: 1px solid var(--border); display: flex; align-items: center; gap: 14px; } .scenario-icon { width: 40px; height: 40px; border-radius: 10px; display: flex; align-items: center; justify-content: center; font-size: 20px; flex-shrink: 0; } .scenario-header h3 { font-family: 'Outfit', sans-serif; font-size: 18px; font-weight: 700; color: var(--text-bright); } .scenario-header .scenario-subtitle { font-size: 13px; color: var(--text-dim); margin-top: 2px; } .scenario-body { padding: 0; } .scenario-approach { padding: 18px 24px; border-bottom: 1px solid var(--border); display: flex; gap: 16px; align-items: flex-start; } .scenario-approach:last-child { border-bottom: none; } .approach-badge { font-family: 'IBM Plex Mono', monospace; font-size: 10.5px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.8px; padding: 4px 10px; border-radius: 6px; white-space: nowrap; flex-shrink: 0; margin-top: 2px; border: 1px solid; } .approach-badge.llm { color: var(--llm-color); background: var(--llm-bg); border-color: var(--llm-border); } .approach-badge.rag { color: var(--rag-color); background: var(--rag-bg); border-color: var(--rag-border); } .approach-badge.agent { color: var(--agent-color); background: var(--agent-bg); border-color: var(--agent-border); } .approach-badge.agentic { color: var(--agentic-color); background: var(--agentic-bg); border-color: var(--agentic-border); } .approach-content h4 { font-size: 14px; font-weight: 600; color: var(--text-bright); margin-bottom: 4px; } .approach-content p { font-size: 13px; color: var(--text); line-height: 1.6; } .approach-content .verdict { margin-top: 6px; font-size: 12px; font-family: 'IBM Plex Mono', monospace; color: var(--text-dim); } /* ── EXPLANATION TEXT BLOCKS ── */ .explanation { background: var(--surface); border: 1px solid var(--border); border-radius: 12px; padding: 24px 28px; margin: 24px 0; line-height: 1.75; } .explanation h3 { font-family: 'Outfit', sans-serif; font-size: 18px; font-weight: 700; color: var(--text-bright); margin-bottom: 12px; } .explanation p { font-size: 14.5px; color: var(--text); margin-bottom: 12px; } .explanation p:last-child { margin-bottom: 0; } .explanation strong { color: var(--text-bright); font-weight: 600; } /* ── LOOP INDICATOR ── */ .loop-indicator { display: flex; align-items: center; gap: 10px; padding: 12px 20px; background: rgba(245,158,11,0.06); border: 1px dashed var(--agent-border); border-radius: 10px; margin: 8px 0 8px 56px; font-family: 'IBM Plex Mono', monospace; font-size: 12px; color: var(--agent-color); } .loop-indicator.agentic-loop { background: rgba(236,72,153,0.06); border-color: var(--agentic-border); color: var(--agentic-color); } .loop-arrow { font-size: 16px; } /* ── FOOTER ── */ .page-footer { text-align: center; padding: 40px 24px; border-top: 1px solid var(--border); font-size: 12px; color: var(--text-dim); } @media (max-width: 700px) { .layer-label { width: 80px; font-size: 9px; } .layer-block { padding: 14px 16px; } .layer-block h3 { font-size: 15px; } .scenario-approach { flex-direction: column; gap: 8px; } .flow-rail { width: 40px; } .flow-card { padding: 14px 16px; } .loop-indicator { margin-left: 40px; } } </style> </head> <body>  <div class="hero"> <h1>How AI <span>Actually Works</span></h1> <p>Visual workflows showing how LLM, RAG, AI Agent, and Agentic AI process a request from start to finish — with real-world scenarios that show when each approach shines.</p> </div>  <div class="nav" id="nav"> <button onclick="show('llm')" id="btn-llm">① LLM</button> <button onclick="show('rag')" id="btn-rag">② RAG</button> <button onclick="show('agent')" id="btn-agent">③ AI Agent</button> <button onclick="show('agentic')" id="btn-agentic">④ Agentic AI</button> <button onclick="show('layers')" id="btn-layers">⑤ Layer Map</button> <button onclick="show('scenario')" id="btn-scenario">⑥ Scenarios</button> </div>    <div class="section active" id="sec-llm"> <div class="section-head"> <div class="section-badge" style="background:var(--llm-color)">1</div> <div> <h2>LLM Workflow</h2> <p>Single-pass text generation — question in, answer out</p> </div> </div> <div class="explanation"> <h3>What's happening under the hood</h3> <p>A Large Language Model is the <strong>foundation</strong> of every other AI pattern on this page. When you type a question, the LLM tokenizes your input (breaks it into sub-word pieces), runs it through dozens of transformer layers that compute attention across the entire context, then predicts the most likely next token — one at a time — until it finishes its response.</p> <p>There is <strong>no retrieval, no planning, no tool use</strong>. The model generates entirely from patterns it learned during training. This makes it fast and versatile, but also means it can confidently produce information that sounds right but is factually wrong (hallucination), and it has no access to information after its training cutoff date or to any private data.</p> </div> <div class="flow"> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>1 · User Input</h3> <p>The human types a prompt — a question, instruction, or creative request. This is the only input the model receives.</p> <div class="detail"><span>Example:</span> "Explain how photosynthesis works in simple terms"</div> </div> </div> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>2 · Tokenization</h3> <p>The text is split into tokens (sub-word units). The word "photosynthesis" might become ["photo", "syn", "thesis"]. Each token is mapped to a numerical ID the model can process.</p> <div class="detail"><span>Technical:</span> BPE or SentencePiece tokenizer · Typical vocab size: 32K–128K tokens</div> </div> </div> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>3 · Transformer Processing</h3> <p>Tokens pass through the transformer's attention layers. Each layer computes relationships between every token and every other token, building a rich understanding of context, meaning, and intent.</p> <div class="detail"><span>Technical:</span> Self-attention + feed-forward networks · 32–128+ layers · Billions of parameters</div> </div> </div> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>4 · Token-by-Token Generation</h3> <p>The model predicts the most probable next token, appends it to the sequence, then predicts the next, and the next — auto-regressively building the full response one token at a time.</p> <div class="detail"><span>Technical:</span> Softmax over vocabulary · Temperature/top-p sampling controls randomness</div> </div> </div> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line" style="background:transparent"></div> </div> <div class="flow-card"> <h3>5 · Output Delivered</h3> <p>The completed text is returned to the user. The model has no memory of this interaction — the next conversation starts from scratch unless the full history is sent again.</p> <div class="detail"><span>Latency:</span> ~0.5–5 seconds · <span>Cost:</span> Lowest (single API call)</div> </div> </div> </div> </div>    <div class="section" id="sec-rag"> <div class="section-head"> <div class="section-badge" style="background:var(--rag-color)">2</div> <div> <h2>RAG Workflow</h2> <p>Retrieve, then generate — grounding answers in real documents</p> </div> </div> <div class="explanation"> <h3>What's happening under the hood</h3> <p>Retrieval-Augmented Generation inserts a <strong>knowledge retrieval step</strong> before the LLM generates its answer. Instead of relying solely on what the model memorized during training, RAG searches your actual documents, finds the most relevant passages, and feeds them into the LLM's prompt as context. The LLM then generates a response grounded in that real information.</p> <p>This is the most effective way to <strong>reduce hallucination</strong> and give the model access to private, current, or domain-specific data without retraining. The trade-off: answer quality depends heavily on whether the retrieval step found the right documents. If it retrieves irrelevant chunks, the answer will be grounded in the wrong information.</p> </div> <div class="flow"> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>Pre-step · Index Your Documents</h3> <p>Before any queries happen, your documents are prepared: split into chunks, converted to vector embeddings by an embedding model, and stored in a vector database. This only happens once (plus updates as documents change).</p> <div class="detail"><span>Tools:</span> LangChain / LlamaIndex for chunking · OpenAI / Cohere embeddings · Pinecone / Chroma / Weaviate for storage</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>1 · User Query</h3> <p>The user asks a question. Unlike a plain LLM, this question will first be used to search for relevant information before any text is generated.</p> <div class="detail"><span>Example:</span> "What is our company's parental leave policy for contractors?"</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>2 · Embed the Query</h3> <p>The question is converted into a vector embedding using the same embedding model that was used to index the documents. This creates a mathematical representation of the question's meaning.</p> <div class="detail"><span>Technical:</span> Same embedding model as indexing · Output: high-dimensional vector (e.g., 1536 dimensions)</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>3 · Vector Search & Retrieval</h3> <p>The query vector is compared against all document vectors in the database using similarity search (typically cosine similarity). The top-K most relevant document chunks are retrieved. Advanced systems add re-ranking to further prioritize the best matches.</p> <div class="detail"><span>Technical:</span> Approximate Nearest Neighbor search · Top-K (usually 3–10 chunks) · Optional re-ranking (Cohere Rerank, cross-encoder)</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>4 · Augment the Prompt</h3> <p>The retrieved document chunks are inserted into the LLM's prompt as context, typically with an instruction like "Answer the question based only on the following context." The original question is appended after the context.</p> <div class="detail"><span>Technical:</span> Context window management · Chunk compression if context exceeds limits</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>5 · LLM Generates Grounded Response</h3> <p>The LLM generates its answer using the retrieved context. Because it's working from actual documents rather than just training memory, the answer is more accurate and can include source citations pointing back to specific documents.</p> <div class="detail"><span>Latency:</span> ~1–8 seconds total · <span>Output:</span> Answer + source references</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line" style="background:transparent"></div> </div> <div class="flow-card"> <h3>6 · Deliver with Citations</h3> <p>The user receives an answer grounded in their actual documents, with references to exactly which sources informed the response. Trust is higher because the provenance is traceable.</p> <div class="detail"><span>Quality gate:</span> If retrieval returns low-confidence matches, system can flag uncertainty instead of guessing</div> </div> </div> </div> </div>    <div class="section" id="sec-agent"> <div class="section-head"> <div class="section-badge" style="background:var(--agent-color)">3</div> <div> <h2>AI Agent Workflow</h2> <p>Plan, act, observe, reflect — an autonomous reasoning loop</p> </div> </div> <div class="explanation"> <h3>What's happening under the hood</h3> <p>An AI Agent wraps an LLM with <strong>autonomy</strong>. Instead of answering a question in one shot, the agent receives a <strong>goal</strong>, then enters a loop: it reasons about what to do next, selects and uses a tool (web search, code execution, API call, file access), observes the result, and decides whether the task is complete or needs more steps.</p> <p>This is the key shift: the LLM is no longer just generating text — it's <strong>making decisions and taking actions</strong>. The most common pattern is <strong>ReAct</strong> (Reason + Act), where the model alternates between thinking ("I need to search for the latest quarterly data") and acting (calling a search tool). The agent can handle errors, retry with different approaches, and self-correct — capabilities no static LLM or RAG system has.</p> </div> <div class="flow"> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>1 · Receive Goal</h3> <p>The user provides a high-level objective, not just a question. The agent needs to figure out how to accomplish it.</p> <div class="detail"><span>Example:</span> "Research the top 5 competitors in our market, find their pricing, and create a comparison spreadsheet"</div> </div> </div> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>2 · Plan</h3> <p>The LLM breaks the goal into a sequence of steps. It identifies what tools it will need, what information it must gather, and in what order to proceed. The plan may be explicit (written out) or implicit (decided step-by-step).</p> <div class="detail"><span>Pattern:</span> Chain-of-thought → Task decomposition → Tool selection</div> </div> </div> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>3 · Select Tool & Act</h3> <p>The agent chooses the most appropriate tool for the current step and executes it. This might be a web search, API call, code execution, database query, file read/write, or email send — whatever the task requires.</p> <div class="detail"><span>Tools available:</span> Web search · Code interpreter · File system · APIs · Calculators · RAG retrieval · Email</div> </div> </div> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>4 · Observe Result</h3> <p>The tool returns a result. The agent reads and interprets the output — was the search helpful? Did the code run without errors? Is the data complete or does it need more?</p> <div class="detail"><span>Technical:</span> Tool output injected back into LLM context as "observation"</div> </div> </div> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>5 · Reflect & Decide</h3> <p>The agent evaluates: Is the goal complete? Do I need more information? Did something fail that I need to retry? Should I adjust my approach? This self-reflection is what makes agents adaptive.</p> <div class="detail"><span>Pattern:</span> ReAct (Reason + Act) · Self-critique · Error recovery logic</div> </div> </div> </div> <div class="loop-indicator"> <span class="loop-arrow">↻</span> <span>Steps 3–5 repeat in a loop until the agent decides the goal is complete (or hits a maximum iteration limit)</span> </div> <div class="flow"> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line" style="background:transparent"></div> </div> <div class="flow-card"> <h3>6 · Deliver Final Output</h3> <p>Once the agent determines the task is complete, it assembles and delivers the final result — which might be a file, a report, a sent email, completed code, or a summary of actions taken.</p> <div class="detail"><span>Latency:</span> 30 sec – several minutes · <span>LLM calls:</span> 5–50+ per task · <span>Cost:</span> 5–20× a single LLM call</div> </div> </div> </div> </div>    <div class="section" id="sec-agentic"> <div class="section-head"> <div class="section-badge" style="background:var(--agentic-color)">4</div> <div> <h2>Agentic AI Workflow</h2> <p>A coordinated team of specialized agents tackling complex projects</p> </div> </div> <div class="explanation"> <h3>What's happening under the hood</h3> <p>Agentic AI is the <strong>orchestration layer</strong> — multiple specialized AI agents working together like a team. Each agent has a defined role (researcher, writer, reviewer, coder), its own set of tools, and access to shared memory. An <strong>orchestrator agent</strong> (or manager) decomposes the overall goal into subtasks and assigns them to the right specialist.</p> <p>The key advantage is <strong>specialization and parallel execution</strong>. A researcher agent can gather data while a writer agent starts drafting based on earlier findings. A reviewer agent checks quality. If something fails, the orchestrator can reassign the task to a different agent or have the original agent retry with new instructions. This mirrors how real human teams work — and can handle projects that would overwhelm any single agent.</p> </div> <div class="flow"> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>1 · Complex Goal Received</h3> <p>A high-level objective that requires multiple distinct capabilities — research, analysis, writing, coding, review — arrives from the user or a triggering system.</p> <div class="detail"><span>Example:</span> "Produce a full competitive analysis report: research 10 competitors, analyze their products, pricing and market positioning, write the report with charts, and prepare a slide deck"</div> </div> </div> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>2 · Orchestrator Decomposes & Plans</h3> <p>The manager/orchestrator agent breaks the goal into a task graph — identifying which subtasks exist, their dependencies, which can run in parallel, and which agent role is best suited for each.</p> <div class="detail"><span>Output:</span> Task graph with assignments → Research Agent (gather data) → Analysis Agent (process data) → Writer Agent (draft report) → Chart Agent (create visuals) → Reviewer Agent (quality check)</div> </div> </div> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>3 · Agents Execute in Parallel / Sequence</h3> <p>Each specialist agent works on its assigned subtask using its own tools, LLM, and domain-specific instructions. Agents that don't depend on each other can run simultaneously. Each agent follows its own internal Reason → Act → Observe loop.</p> <div class="detail"><span>Parallel:</span> Research agents searching different competitors simultaneously<br><span>Sequential:</span> Writer waits for research to finish before drafting</div> </div> </div> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>4 · Shared Memory & Communication</h3> <p>As agents complete subtasks, their outputs are written to shared memory — a common state store all agents can read. The research agent's findings become available to the writer; the writer's draft becomes available to the reviewer. Agents can also send messages to request clarification from each other.</p> <div class="detail"><span>Technical:</span> Shared state store (Redis, SQLite, vector DB) · Message queues · Event-driven triggers</div> </div> </div> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>5 · Review & Validation</h3> <p>Dedicated reviewer agents evaluate the work of other agents — checking for accuracy, completeness, consistency, and quality. If an output doesn't meet standards, it's sent back to the responsible agent with specific feedback for revision.</p> <div class="detail"><span>Pattern:</span> Peer review · Iterative debate · Consensus checking · Human-in-the-loop approval gates</div> </div> </div> </div> <div class="loop-indicator agentic-loop"> <span class="loop-arrow">↻</span> <span>Steps 3–5 repeat: orchestrator monitors progress, reassigns failed tasks, triggers dependent tasks when prerequisites complete</span> </div> <div class="flow"> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line" style="background:transparent"></div> </div> <div class="flow-card"> <h3>6 · Assemble & Deliver</h3> <p>Once all subtasks pass review, the orchestrator assembles the final deliverable — combining outputs from all agents into a cohesive package. The user receives the completed project.</p> <div class="detail"><span>Latency:</span> Minutes to hours · <span>LLM calls:</span> 50–500+ · <span>Cost:</span> 20–100× a single LLM call · <span>Output:</span> Full project deliverables</div> </div> </div> </div> </div>    <div class="section" id="sec-layers"> <div class="section-head"> <div class="section-badge" style="background:var(--accent)">⧉</div> <div> <h2>The Layer Map</h2> <p>How these four paradigms stack on top of each other</p> </div> </div> <div class="explanation"> <h3>They're not alternatives — they're layers</h3> <p>The biggest misconception is that you have to <strong>choose one</strong>. In practice, these paradigms <strong>stack</strong>. An Agentic AI system contains multiple AI Agents. Each Agent uses an LLM as its reasoning engine. Many of those Agents use RAG to access knowledge bases. The LLM is the foundation everything else is built on.</p> <p>Think of it like building construction: the <strong>LLM is the foundation</strong> (structural intelligence). <strong>RAG adds plumbing</strong> (connecting to external data sources). An <strong>Agent adds a worker</strong> who can use tools and make decisions. <strong>Agentic AI is the whole construction crew</strong> — multiple specialized workers coordinating to build the complete structure.</p> </div> <div class="layers-stack">  <div class="layer-row"> <div class="layer-label" style="color:var(--agentic-color)">Layer 4<br>Orchestration</div> <div class="layer-block layer-4"> <h3>Agentic AI</h3> <p>Multi-agent orchestration. Decomposes complex goals across teams of specialized agents. Adds shared memory, inter-agent communication, parallel execution, review cycles, and project-level coordination.</p> <div class="layer-tech">CrewAI · AutoGen · LangGraph · MetaGPT · OpenAI Swarm</div> </div> </div> <div class="connector-bar"><div class="cbar"></div></div>  <div class="layer-row"> <div class="layer-label" style="color:var(--agent-color)">Layer 3<br>Autonomy</div> <div class="layer-block layer-3"> <h3>AI Agent</h3> <p>Wraps the LLM with agency: planning, tool use, observation loops, self-reflection, and error recovery. Transforms a text generator into an autonomous worker that can interact with the world.</p> <div class="layer-tech">ReAct · Function calling · MCP protocol · Code execution · API integration</div> </div> </div> <div class="connector-bar"><div class="cbar"></div></div>  <div class="layer-row"> <div class="layer-label" style="color:var(--rag-color)">Layer 2<br>Knowledge</div> <div class="layer-block layer-2"> <h3>RAG</h3> <p>Connects the LLM to external knowledge. Adds a retrieval step that searches vector databases, document stores, or live data sources to ground responses in real, current, verifiable information.</p> <div class="layer-tech">Pinecone · Chroma · Weaviate · FAISS · LlamaIndex · LangChain · Embedding models</div> </div> </div> <div class="connector-bar"><div class="cbar"></div></div>  <div class="layer-row"> <div class="layer-label" style="color:var(--llm-color)">Layer 1<br>Intelligence</div> <div class="layer-block layer-1"> <h3>LLM</h3> <p>The core reasoning engine. Understands language, generates text, follows instructions, and performs inference. Every layer above depends on this foundation. Without the LLM, nothing else works.</p> <div class="layer-tech">GPT-4 · Claude · Gemini · Llama · Mistral · Transformer architecture · Attention mechanism</div> </div> </div> </div> <div class="explanation" style="margin-top:32px;"> <h3>Common real-world stacking patterns</h3> <p><strong>LLM alone →</strong> ChatGPT answering general questions, GitHub Copilot suggesting code, a chatbot writing marketing copy. Fast, cheap, creative — but no access to your data and prone to hallucination.</p> <p><strong>LLM + RAG →</strong> Enterprise help desk that answers questions using your company's internal documentation. Perplexity AI searching the web and citing sources. A legal research tool that retrieves relevant case law before generating analysis.</p> <p><strong>LLM + RAG + Agent →</strong> A coding assistant that reads your codebase (RAG), plans a fix (planning), writes the code (generation), runs the tests (tool use), and iterates until all tests pass (reflection loop). Claude with computer use is an example — it can browse, search, write files, and execute code.</p> <p><strong>LLM + RAG + Multi-Agent →</strong> A research team where one agent searches academic papers (RAG), another agent searches financial data (RAG + API tools), a writer agent synthesizes findings into a report, and a reviewer agent fact-checks everything before delivery. CrewAI and AutoGen enable this pattern.</p> </div> </div>    <div class="section" id="sec-scenario"> <div class="section-head"> <div class="section-badge" style="background:var(--accent)">?</div> <div> <h2>Same Task, Four Approaches</h2> <p>See how each paradigm handles the same real-world scenario — and which is the right fit</p> </div> </div> <div class="scenario-grid">  <div class="scenario-card"> <div class="scenario-header"> <div class="scenario-icon" style="background:rgba(99,102,241,0.15)">📋</div> <div> <h3>Scenario: "What's our refund policy for enterprise clients?"</h3> <div class="scenario-subtitle">An employee needs an answer from internal company documentation</div> </div> </div> <div class="scenario-body"> <div class="scenario-approach"> <div class="approach-badge llm">LLM</div> <div class="approach-content"> <h4>Generates a plausible-sounding answer from training data</h4> <p>The LLM has never seen your company's internal documents. It will generate a generic refund policy that sounds reasonable but may be completely wrong for your specific company. There's no way for it to know your actual terms.</p> <div class="verdict">⚠ Risk: Hallucinated policy details could lead to contractual issues</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge rag">RAG ✓</div> <div class="approach-content"> <h4>Searches your policy documents and returns the exact answer with citations</h4> <p>RAG embeds the question, searches your vector database of company policies, retrieves the specific refund policy section, and generates a precise answer citing the exact document, section number, and effective date.</p> <div class="verdict">✅ Best fit — this is a knowledge retrieval task. RAG is built for exactly this.</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agent">Agent</div> <div class="approach-content"> <h4>Overkill — would work but adds unnecessary complexity and cost</h4> <p>An agent could search your docs, but for a single Q&A lookup, the planning and tool-selection overhead adds latency and cost without benefit. There are no multi-step actions needed here.</p> <div class="verdict">⚡ Works but wasteful — like hiring a project manager to answer one question</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agentic">Agentic</div> <div class="approach-content"> <h4>Massively overkill — deploying a team for a single lookup</h4> <p>Spinning up multiple specialized agents for a simple document lookup would be like convening a board meeting to check a fact. Expensive, slow, and completely unnecessary.</p> <div class="verdict">🚫 Wrong tool for the job</div> </div> </div> </div> </div>  <div class="scenario-card"> <div class="scenario-header"> <div class="scenario-icon" style="background:rgba(99,102,241,0.15)">📊</div> <div> <h3>Scenario: "Research our top 5 competitors and create a pricing comparison spreadsheet"</h3> <div class="scenario-subtitle">A strategist needs multi-step research and file creation</div> </div> </div> <div class="scenario-body"> <div class="scenario-approach"> <div class="approach-badge llm">LLM</div> <div class="approach-content"> <h4>Generates a comparison from training data — likely outdated and unverifiable</h4> <p>Can produce a comparison table based on what it learned during training, but pricing data changes constantly. The output might reflect last year's pricing or entirely fabricated numbers. Cannot actually create a spreadsheet file.</p> <div class="verdict">⚠ Outdated data, no file creation, no ability to verify current pricing</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge rag">RAG</div> <div class="approach-content"> <h4>Could answer if competitor pricing was already in your knowledge base</h4> <p>If you've already collected and indexed competitor pricing data, RAG can retrieve and present it. But it can't go find new data, visit competitor websites, or create spreadsheet files. It only works with what's already indexed.</p> <div class="verdict">⚠ Only works if the data is already collected — can't gather new information</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agent">Agent ✓</div> <div class="approach-content"> <h4>Researches competitors, gathers pricing, and builds the spreadsheet autonomously</h4> <p>The agent plans the task: (1) search the web for each competitor's pricing page, (2) extract pricing tiers and features, (3) organize the data into a structured comparison, (4) generate a spreadsheet file. It handles errors (page not found, data format changes) and iterates until complete.</p> <div class="verdict">✅ Best fit — multi-step task requiring research, tool use, and file creation</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agentic">Agentic</div> <div class="approach-content"> <h4>Would work well but may be more than needed for 5 competitors</h4> <p>A multi-agent team could parallelize the research (one agent per competitor), but for just 5 competitors, the orchestration overhead likely doesn't pay off. Better suited if the scope expanded to 50 competitors or added deeper analysis.</p> <div class="verdict">⚡ Viable if scope is large enough to justify the coordination overhead</div> </div> </div> </div> </div>  <div class="scenario-card"> <div class="scenario-header"> <div class="scenario-icon" style="background:rgba(99,102,241,0.15)">🚀</div> <div> <h3>Scenario: "Launch a full marketing campaign for our new product — research, copy, images, ads, and scheduling"</h3> <div class="scenario-subtitle">A marketing director needs a complete cross-functional campaign</div> </div> </div> <div class="scenario-body"> <div class="scenario-approach"> <div class="approach-badge llm">LLM</div> <div class="approach-content"> <h4>Can draft individual pieces of copy but can't execute anything</h4> <p>A plain LLM can brainstorm campaign ideas, write ad copy, suggest headlines — but it can't research the market, generate images, schedule posts, or coordinate across channels. You'd need to manually prompt it dozens of times and handle all execution yourself.</p> <div class="verdict">⚠ Useful as a writing assistant within a larger manual workflow</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge rag">RAG</div> <div class="approach-content"> <h4>Could inform the campaign with brand guidelines and past campaign data</h4> <p>RAG could retrieve your brand voice guidelines, past successful campaigns, audience research, and product specifications to ensure consistency — but it's a knowledge source, not an executor. It can't write, design, or schedule.</p> <div class="verdict">⚠ Valuable as a component within an agent system, not as the primary approach</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agent">Agent</div> <div class="approach-content"> <h4>Could handle pieces sequentially but would struggle with the full scope</h4> <p>A single agent could research the market, then write copy, then try to generate images — but it would work through everything sequentially. The task is broad enough that a single agent's context window and focus would be stretched thin. Quality degrades on later steps.</p> <div class="verdict">⚠ Possible but slow, and quality drops as task complexity increases</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agentic">Agentic ✓</div> <div class="approach-content"> <h4>Deploys a team: researcher, copywriter, designer, scheduler, reviewer</h4> <p>The orchestrator decomposes the campaign into parallel workstreams: a researcher agent analyzes the market and audience, a copywriter agent creates messaging (using RAG to reference brand guidelines), a designer agent generates visual assets, a scheduler agent plans the content calendar, and a reviewer agent checks everything for consistency and quality before launch.</p> <div class="verdict">✅ Best fit — complex, cross-functional project requiring multiple specialized skills working in coordination</div> </div> </div> </div> </div> </div> <div class="explanation" style="margin-top:36px;"> <h3>The decision rule is simple</h3> <p><strong>If you need a quick answer or creative text →</strong> LLM. It's the fastest and cheapest option for anything that doesn't require verified facts or actions.</p> <p><strong>If the answer must come from specific documents →</strong> RAG. Whenever accuracy matters and the information exists in a knowable corpus, retrieval-augmented generation is the right pattern.</p> <p><strong>If you need actions taken, not just text produced →</strong> AI Agent. Any task that requires multiple steps, tool use, web browsing, file creation, or API calls needs an agent.</p> <p><strong>If the project needs a team, not just one worker →</strong> Agentic AI. When the scope involves multiple distinct skill sets, parallel workstreams, and quality review — that's when you bring in the multi-agent system.</p> <p>And remember: in production, these <strong>stack together</strong>. The best agentic systems use RAG for knowledge retrieval, agents for autonomous execution, and the LLM as the brain powering everything.</p> </div> </div>  <div class="page-footer"> AI Workflow Reference · February 2026 </div> <script> function show(id) { document.querySelectorAll('.section').forEach(s => s.classList.remove('active')); document.querySelectorAll('.nav button').forEach(b => { b.className = ''; }); document.getElementById('sec-' + id).classList.add('active'); document.getElementById('btn-' + id).classList.add('active-' + id); // Re-trigger animations const section = document.getElementById('sec-' + id); section.querySelectorAll('.flow-step, .layer-row, .scenario-card').forEach(el => { el.style.animation = 'none'; el.offsetHeight; // trigger reflow el.style.animation = ''; }); } // Initialize show('llm'); </script> </body> </html>