How AI Actually Works
Visual workflows showing how LLM, RAG, AI Agent, and Agentic AI process a request from start to finish — with real-world scenarios that show when each approach shines.
LLM Workflow
Single-pass text generation — question in, answer out
What’s happening under the hood
A Large Language Model is the foundation of every other AI pattern on this page. When you type a question, the LLM tokenizes your input (breaks it into sub-word pieces), runs it through dozens of transformer layers that compute attention across the entire context, then predicts the most likely next token — one at a time — until it finishes its response.
There is no retrieval, no planning, no tool use. The model generates entirely from patterns it learned during training. This makes it fast and versatile, but also means it can confidently produce information that sounds right but is factually wrong (hallucination), and it has no access to information after its training cutoff date or to any private data.
1 · User Input
The human types a prompt — a question, instruction, or creative request. This is the only input the model receives.
2 · Tokenization
The text is split into tokens (sub-word units). The word “photosynthesis” might become [“photo”, “syn”, “thesis”]. Each token is mapped to a numerical ID the model can process.
3 · Transformer Processing
Tokens pass through the transformer’s attention layers. Each layer computes relationships between every token and every other token, building a rich understanding of context, meaning, and intent.
4 · Token-by-Token Generation
The model predicts the most probable next token, appends it to the sequence, then predicts the next, and the next — auto-regressively building the full response one token at a time.
5 · Output Delivered
The completed text is returned to the user. The model has no memory of this interaction — the next conversation starts from scratch unless the full history is sent again.
RAG Workflow
Retrieve, then generate — grounding answers in real documents
What’s happening under the hood
Retrieval-Augmented Generation inserts a knowledge retrieval step before the LLM generates its answer. Instead of relying solely on what the model memorized during training, RAG searches your actual documents, finds the most relevant passages, and feeds them into the LLM’s prompt as context. The LLM then generates a response grounded in that real information.
This is the most effective way to reduce hallucination and give the model access to private, current, or domain-specific data without retraining. The trade-off: answer quality depends heavily on whether the retrieval step found the right documents. If it retrieves irrelevant chunks, the answer will be grounded in the wrong information.
Pre-step · Index Your Documents
Before any queries happen, your documents are prepared: split into chunks, converted to vector embeddings by an embedding model, and stored in a vector database. This only happens once (plus updates as documents change).
1 · User Query
The user asks a question. Unlike a plain LLM, this question will first be used to search for relevant information before any text is generated.
2 · Embed the Query
The question is converted into a vector embedding using the same embedding model that was used to index the documents. This creates a mathematical representation of the question’s meaning.
3 · Vector Search & Retrieval
The query vector is compared against all document vectors in the database using similarity search (typically cosine similarity). The top-K most relevant document chunks are retrieved. Advanced systems add re-ranking to further prioritize the best matches.
4 · Augment the Prompt
The retrieved document chunks are inserted into the LLM’s prompt as context, typically with an instruction like “Answer the question based only on the following context.” The original question is appended after the context.
5 · LLM Generates Grounded Response
The LLM generates its answer using the retrieved context. Because it’s working from actual documents rather than just training memory, the answer is more accurate and can include source citations pointing back to specific documents.
6 · Deliver with Citations
The user receives an answer grounded in their actual documents, with references to exactly which sources informed the response. Trust is higher because the provenance is traceable.
AI Agent Workflow
Plan, act, observe, reflect — an autonomous reasoning loop
What’s happening under the hood
An AI Agent wraps an LLM with autonomy. Instead of answering a question in one shot, the agent receives a goal, then enters a loop: it reasons about what to do next, selects and uses a tool (web search, code execution, API call, file access), observes the result, and decides whether the task is complete or needs more steps.
This is the key shift: the LLM is no longer just generating text — it’s making decisions and taking actions. The most common pattern is ReAct (Reason + Act), where the model alternates between thinking (“I need to search for the latest quarterly data”) and acting (calling a search tool). The agent can handle errors, retry with different approaches, and self-correct — capabilities no static LLM or RAG system has.
1 · Receive Goal
The user provides a high-level objective, not just a question. The agent needs to figure out how to accomplish it.
2 · Plan
The LLM breaks the goal into a sequence of steps. It identifies what tools it will need, what information it must gather, and in what order to proceed. The plan may be explicit (written out) or implicit (decided step-by-step).
3 · Select Tool & Act
The agent chooses the most appropriate tool for the current step and executes it. This might be a web search, API call, code execution, database query, file read/write, or email send — whatever the task requires.
4 · Observe Result
The tool returns a result. The agent reads and interprets the output — was the search helpful? Did the code run without errors? Is the data complete or does it need more?
5 · Reflect & Decide
The agent evaluates: Is the goal complete? Do I need more information? Did something fail that I need to retry? Should I adjust my approach? This self-reflection is what makes agents adaptive.
6 · Deliver Final Output
Once the agent determines the task is complete, it assembles and delivers the final result — which might be a file, a report, a sent email, completed code, or a summary of actions taken.
Agentic AI Workflow
A coordinated team of specialized agents tackling complex projects
What’s happening under the hood
Agentic AI is the orchestration layer — multiple specialized AI agents working together like a team. Each agent has a defined role (researcher, writer, reviewer, coder), its own set of tools, and access to shared memory. An orchestrator agent (or manager) decomposes the overall goal into subtasks and assigns them to the right specialist.
The key advantage is specialization and parallel execution. A researcher agent can gather data while a writer agent starts drafting based on earlier findings. A reviewer agent checks quality. If something fails, the orchestrator can reassign the task to a different agent or have the original agent retry with new instructions. This mirrors how real human teams work — and can handle projects that would overwhelm any single agent.
1 · Complex Goal Received
A high-level objective that requires multiple distinct capabilities — research, analysis, writing, coding, review — arrives from the user or a triggering system.
2 · Orchestrator Decomposes & Plans
The manager/orchestrator agent breaks the goal into a task graph — identifying which subtasks exist, their dependencies, which can run in parallel, and which agent role is best suited for each.
3 · Agents Execute in Parallel / Sequence
Each specialist agent works on its assigned subtask using its own tools, LLM, and domain-specific instructions. Agents that don’t depend on each other can run simultaneously. Each agent follows its own internal Reason → Act → Observe loop.
Sequential: Writer waits for research to finish before drafting
4 · Shared Memory & Communication
As agents complete subtasks, their outputs are written to shared memory — a common state store all agents can read. The research agent’s findings become available to the writer; the writer’s draft becomes available to the reviewer. Agents can also send messages to request clarification from each other.
5 · Review & Validation
Dedicated reviewer agents evaluate the work of other agents — checking for accuracy, completeness, consistency, and quality. If an output doesn’t meet standards, it’s sent back to the responsible agent with specific feedback for revision.
6 · Assemble & Deliver
Once all subtasks pass review, the orchestrator assembles the final deliverable — combining outputs from all agents into a cohesive package. The user receives the completed project.
The Layer Map
How these four paradigms stack on top of each other
They’re not alternatives — they’re layers
The biggest misconception is that you have to choose one. In practice, these paradigms stack. An Agentic AI system contains multiple AI Agents. Each Agent uses an LLM as its reasoning engine. Many of those Agents use RAG to access knowledge bases. The LLM is the foundation everything else is built on.
Think of it like building construction: the LLM is the foundation (structural intelligence). RAG adds plumbing (connecting to external data sources). An Agent adds a worker who can use tools and make decisions. Agentic AI is the whole construction crew — multiple specialized workers coordinating to build the complete structure.
Orchestration
Agentic AI
Multi-agent orchestration. Decomposes complex goals across teams of specialized agents. Adds shared memory, inter-agent communication, parallel execution, review cycles, and project-level coordination.
Autonomy
AI Agent
Wraps the LLM with agency: planning, tool use, observation loops, self-reflection, and error recovery. Transforms a text generator into an autonomous worker that can interact with the world.
Knowledge
RAG
Connects the LLM to external knowledge. Adds a retrieval step that searches vector databases, document stores, or live data sources to ground responses in real, current, verifiable information.
Intelligence
LLM
The core reasoning engine. Understands language, generates text, follows instructions, and performs inference. Every layer above depends on this foundation. Without the LLM, nothing else works.
Common real-world stacking patterns
LLM alone → ChatGPT answering general questions, GitHub Copilot suggesting code, a chatbot writing marketing copy. Fast, cheap, creative — but no access to your data and prone to hallucination.
LLM + RAG → Enterprise help desk that answers questions using your company’s internal documentation. Perplexity AI searching the web and citing sources. A legal research tool that retrieves relevant case law before generating analysis.
LLM + RAG + Agent → A coding assistant that reads your codebase (RAG), plans a fix (planning), writes the code (generation), runs the tests (tool use), and iterates until all tests pass (reflection loop). Claude with computer use is an example — it can browse, search, write files, and execute code.
LLM + RAG + Multi-Agent → A research team where one agent searches academic papers (RAG), another agent searches financial data (RAG + API tools), a writer agent synthesizes findings into a report, and a reviewer agent fact-checks everything before delivery. CrewAI and AutoGen enable this pattern.
Same Task, Four Approaches
See how each paradigm handles the same real-world scenario — and which is the right fit
Scenario: “What’s our refund policy for enterprise clients?”
Generates a plausible-sounding answer from training data
The LLM has never seen your company’s internal documents. It will generate a generic refund policy that sounds reasonable but may be completely wrong for your specific company. There’s no way for it to know your actual terms.
Searches your policy documents and returns the exact answer with citations
RAG embeds the question, searches your vector database of company policies, retrieves the specific refund policy section, and generates a precise answer citing the exact document, section number, and effective date.
Overkill — would work but adds unnecessary complexity and cost
An agent could search your docs, but for a single Q&A lookup, the planning and tool-selection overhead adds latency and cost without benefit. There are no multi-step actions needed here.
Massively overkill — deploying a team for a single lookup
Spinning up multiple specialized agents for a simple document lookup would be like convening a board meeting to check a fact. Expensive, slow, and completely unnecessary.
Scenario: “Research our top 5 competitors and create a pricing comparison spreadsheet”
Generates a comparison from training data — likely outdated and unverifiable
Can produce a comparison table based on what it learned during training, but pricing data changes constantly. The output might reflect last year’s pricing or entirely fabricated numbers. Cannot actually create a spreadsheet file.
Could answer if competitor pricing was already in your knowledge base
If you’ve already collected and indexed competitor pricing data, RAG can retrieve and present it. But it can’t go find new data, visit competitor websites, or create spreadsheet files. It only works with what’s already indexed.
Researches competitors, gathers pricing, and builds the spreadsheet autonomously
The agent plans the task: (1) search the web for each competitor’s pricing page, (2) extract pricing tiers and features, (3) organize the data into a structured comparison, (4) generate a spreadsheet file. It handles errors (page not found, data format changes) and iterates until complete.
Would work well but may be more than needed for 5 competitors
A multi-agent team could parallelize the research (one agent per competitor), but for just 5 competitors, the orchestration overhead likely doesn’t pay off. Better suited if the scope expanded to 50 competitors or added deeper analysis.
Scenario: “Launch a full marketing campaign for our new product — research, copy, images, ads, and scheduling”
Can draft individual pieces of copy but can’t execute anything
A plain LLM can brainstorm campaign ideas, write ad copy, suggest headlines — but it can’t research the market, generate images, schedule posts, or coordinate across channels. You’d need to manually prompt it dozens of times and handle all execution yourself.
Could inform the campaign with brand guidelines and past campaign data
RAG could retrieve your brand voice guidelines, past successful campaigns, audience research, and product specifications to ensure consistency — but it’s a knowledge source, not an executor. It can’t write, design, or schedule.
Could handle pieces sequentially but would struggle with the full scope
A single agent could research the market, then write copy, then try to generate images — but it would work through everything sequentially. The task is broad enough that a single agent’s context window and focus would be stretched thin. Quality degrades on later steps.
Deploys a team: researcher, copywriter, designer, scheduler, reviewer
The orchestrator decomposes the campaign into parallel workstreams: a researcher agent analyzes the market and audience, a copywriter agent creates messaging (using RAG to reference brand guidelines), a designer agent generates visual assets, a scheduler agent plans the content calendar, and a reviewer agent checks everything for consistency and quality before launch.
The decision rule is simple
If you need a quick answer or creative text → LLM. It’s the fastest and cheapest option for anything that doesn’t require verified facts or actions.
If the answer must come from specific documents → RAG. Whenever accuracy matters and the information exists in a knowable corpus, retrieval-augmented generation is the right pattern.
If you need actions taken, not just text produced → AI Agent. Any task that requires multiple steps, tool use, web browsing, file creation, or API calls needs an agent.
If the project needs a team, not just one worker → Agentic AI. When the scope involves multiple distinct skill sets, parallel workstreams, and quality review — that’s when you bring in the multi-agent system.
And remember: in production, these stack together. The best agentic systems use RAG for knowledge retrieval, agents for autonomous execution, and the LLM as the brain powering everything.
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>AI Workflows — How LLM, RAG, Agent & Agentic AI Actually Work</title> <link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:ital,wght@0,300;0,400;0,500;0,600;0,700&family=IBM+Plex+Mono:wght@400;500;600&family=Outfit:wght@700;800;900&display=swap" rel="stylesheet"> <style> :root { --bg: #0c1117; --surface: #151b24; --surface2: #1a222d; --border: #252e3b; --border-light: #2d3848; --text: #c8d1dc; --text-bright: #e8edf4; --text-dim: #6b7a8d; --llm-color: #3b82f6; --llm-bg: rgba(59,130,246,0.08); --llm-border: rgba(59,130,246,0.25); --rag-color: #10b981; --rag-bg: rgba(16,185,129,0.08); --rag-border: rgba(16,185,129,0.25); --agent-color: #f59e0b; --agent-bg: rgba(245,158,11,0.08); --agent-border: rgba(245,158,11,0.25); --agentic-color: #ec4899; --agentic-bg: rgba(236,72,153,0.08); --agentic-border: rgba(236,72,153,0.25); --accent: #6366f1; } * { margin: 0; padding: 0; box-sizing: border-box; } body { background: var(--bg); color: var(--text); font-family: 'IBM Plex Sans', sans-serif; line-height: 1.65; min-height: 100vh; } /* ── HEADER ── */ .hero { text-align: center; padding: 60px 24px 40px; background: linear-gradient(180deg, #0f1620 0%, var(--bg) 100%); border-bottom: 1px solid var(--border); } .hero h1 { font-family: 'Outfit', sans-serif; font-size: clamp(28px, 4.5vw, 48px); font-weight: 900; color: var(--text-bright); letter-spacing: -1px; margin-bottom: 12px; } .hero h1 span { color: var(--accent); } .hero p { font-size: 16px; color: var(--text-dim); max-width: 680px; margin: 0 auto; } /* ── NAV TABS ── */ .nav { display: flex; justify-content: center; gap: 6px; padding: 20px 16px; position: sticky; top: 0; background: rgba(12,17,23,0.92); backdrop-filter: blur(12px); -webkit-backdrop-filter: blur(12px); z-index: 100; border-bottom: 1px solid var(--border); flex-wrap: wrap; } .nav button { font-family: 'IBM Plex Sans', sans-serif; font-size: 13px; font-weight: 600; padding: 8px 18px; border-radius: 8px; border: 1px solid var(--border); background: var(--surface); color: var(--text-dim); cursor: pointer; transition: all 0.2s; } .nav button:hover { border-color: var(--border-light); color: var(--text); } .nav button.active-llm { background: var(--llm-bg); border-color: var(--llm-border); color: var(--llm-color); } .nav button.active-rag { background: var(--rag-bg); border-color: var(--rag-border); color: var(--rag-color); } .nav button.active-agent { background: var(--agent-bg); border-color: var(--agent-border); color: var(--agent-color); } .nav button.active-agentic { background: var(--agentic-bg); border-color: var(--agentic-border); color: var(--agentic-color); } .nav button.active-layers { background: rgba(99,102,241,0.1); border-color: rgba(99,102,241,0.3); color: var(--accent); } .nav button.active-scenario { background: rgba(99,102,241,0.1); border-color: rgba(99,102,241,0.3); color: var(--accent); } /* ── SECTIONS ── */ .section { display: none; max-width: 1100px; margin: 0 auto; padding: 40px 24px 60px; } .section.active { display: block; } .section-head { display: flex; align-items: center; gap: 14px; margin-bottom: 28px; } .section-badge { display: inline-flex; align-items: center; justify-content: center; width: 44px; height: 44px; border-radius: 12px; font-family: 'Outfit', sans-serif; font-weight: 800; font-size: 18px; color: #fff; flex-shrink: 0; } .section-head h2 { font-family: 'Outfit', sans-serif; font-size: 28px; font-weight: 800; color: var(--text-bright); } .section-head p { font-size: 14px; color: var(--text-dim); margin-top: 2px; } /* ── FLOW DIAGRAM ── */ .flow { display: flex; flex-direction: column; gap: 0; margin: 32px 0; position: relative; } .flow-step { display: flex; align-items: stretch; gap: 0; opacity: 0; transform: translateY(16px); animation: fadeUp 0.4s ease forwards; } .flow-step:nth-child(1) { animation-delay: 0.1s; } .flow-step:nth-child(2) { animation-delay: 0.2s; } .flow-step:nth-child(3) { animation-delay: 0.3s; } .flow-step:nth-child(4) { animation-delay: 0.4s; } .flow-step:nth-child(5) { animation-delay: 0.5s; } .flow-step:nth-child(6) { animation-delay: 0.6s; } .flow-step:nth-child(7) { animation-delay: 0.7s; } .flow-step:nth-child(8) { animation-delay: 0.8s; } .flow-step:nth-child(9) { animation-delay: 0.9s; } @keyframes fadeUp { to { opacity: 1; transform: translateY(0); } } .flow-rail { width: 56px; display: flex; flex-direction: column; align-items: center; flex-shrink: 0; position: relative; } .flow-dot { width: 14px; height: 14px; border-radius: 50%; border: 2.5px solid; background: var(--bg); z-index: 2; flex-shrink: 0; margin-top: 18px; } .flow-line { width: 2px; flex: 1; min-height: 10px; } .flow-card { flex: 1; background: var(--surface); border: 1px solid var(--border); border-radius: 12px; padding: 18px 22px; margin: 6px 0; transition: border-color 0.2s, background 0.2s; } .flow-card:hover { border-color: var(--border-light); background: var(--surface2); } .flow-card h3 { font-family: 'IBM Plex Mono', monospace; font-size: 13px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 6px; } .flow-card p { font-size: 14px; color: var(--text); line-height: 1.6; } .flow-card .detail { margin-top: 8px; padding-top: 8px; border-top: 1px solid var(--border); font-size: 12.5px; color: var(--text-dim); font-family: 'IBM Plex Mono', monospace; } .flow-card .detail span { font-weight: 600; } /* Color themes for flow */ .flow-llm .flow-dot { border-color: var(--llm-color); } .flow-llm .flow-line { background: var(--llm-border); } .flow-llm .flow-card h3 { color: var(--llm-color); } .flow-rag .flow-dot { border-color: var(--rag-color); } .flow-rag .flow-line { background: var(--rag-border); } .flow-rag .flow-card h3 { color: var(--rag-color); } .flow-agent .flow-dot { border-color: var(--agent-color); } .flow-agent .flow-line { background: var(--agent-border); } .flow-agent .flow-card h3 { color: var(--agent-color); } .flow-agentic .flow-dot { border-color: var(--agentic-color); } .flow-agentic .flow-line { background: var(--agentic-border); } .flow-agentic .flow-card h3 { color: var(--agentic-color); } /* ── LAYER DIAGRAM ── */ .layers-stack { display: flex; flex-direction: column; gap: 0; margin: 32px 0; } .layer-row { display: flex; align-items: stretch; gap: 20px; opacity: 0; animation: fadeUp 0.4s ease forwards; } .layer-row:nth-child(1) { animation-delay: 0.15s; } .layer-row:nth-child(2) { animation-delay: 0.3s; } .layer-row:nth-child(3) { animation-delay: 0.45s; } .layer-row:nth-child(4) { animation-delay: 0.6s; } .layer-row:nth-child(5) { animation-delay: 0.75s; } .layer-label { width: 130px; flex-shrink: 0; display: flex; align-items: center; justify-content: flex-end; padding-right: 16px; font-family: 'IBM Plex Mono', monospace; font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 1px; color: var(--text-dim); } .layer-block { flex: 1; border-radius: 12px; padding: 20px 24px; border: 1.5px solid; margin: 4px 0; position: relative; } .layer-block h3 { font-family: 'Outfit', sans-serif; font-size: 18px; font-weight: 700; margin-bottom: 6px; } .layer-block p { font-size: 13.5px; line-height: 1.6; } .layer-block .layer-tech { margin-top: 10px; font-family: 'IBM Plex Mono', monospace; font-size: 11.5px; opacity: 0.7; } .layer-connector { width: 130px; flex-shrink: 0; } .layer-connector-line { width: 2px; height: 16px; margin: 0 auto; } .layer-4 { background: var(--agentic-bg); border-color: var(--agentic-border); } .layer-4 h3 { color: var(--agentic-color); } .layer-3 { background: var(--agent-bg); border-color: var(--agent-border); } .layer-3 h3 { color: var(--agent-color); } .layer-2 { background: var(--rag-bg); border-color: var(--rag-border); } .layer-2 h3 { color: var(--rag-color); } .layer-1 { background: var(--llm-bg); border-color: var(--llm-border); } .layer-1 h3 { color: var(--llm-color); } .connector-bar { display: flex; justify-content: center; padding: 0; } .connector-bar .cbar { width: 2px; height: 20px; background: var(--border-light); } .connector-bar .clabel { display: none; } /* ── SCENARIO WALKTHROUGHS ── */ .scenario-grid { display: grid; grid-template-columns: 1fr; gap: 28px; margin-top: 28px; } .scenario-card { background: var(--surface); border: 1px solid var(--border); border-radius: 14px; overflow: hidden; opacity: 0; animation: fadeUp 0.4s ease forwards; } .scenario-card:nth-child(1) { animation-delay: 0.1s; } .scenario-card:nth-child(2) { animation-delay: 0.2s; } .scenario-card:nth-child(3) { animation-delay: 0.3s; } .scenario-header { padding: 20px 24px; border-bottom: 1px solid var(--border); display: flex; align-items: center; gap: 14px; } .scenario-icon { width: 40px; height: 40px; border-radius: 10px; display: flex; align-items: center; justify-content: center; font-size: 20px; flex-shrink: 0; } .scenario-header h3 { font-family: 'Outfit', sans-serif; font-size: 18px; font-weight: 700; color: var(--text-bright); } .scenario-header .scenario-subtitle { font-size: 13px; color: var(--text-dim); margin-top: 2px; } .scenario-body { padding: 0; } .scenario-approach { padding: 18px 24px; border-bottom: 1px solid var(--border); display: flex; gap: 16px; align-items: flex-start; } .scenario-approach:last-child { border-bottom: none; } .approach-badge { font-family: 'IBM Plex Mono', monospace; font-size: 10.5px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.8px; padding: 4px 10px; border-radius: 6px; white-space: nowrap; flex-shrink: 0; margin-top: 2px; border: 1px solid; } .approach-badge.llm { color: var(--llm-color); background: var(--llm-bg); border-color: var(--llm-border); } .approach-badge.rag { color: var(--rag-color); background: var(--rag-bg); border-color: var(--rag-border); } .approach-badge.agent { color: var(--agent-color); background: var(--agent-bg); border-color: var(--agent-border); } .approach-badge.agentic { color: var(--agentic-color); background: var(--agentic-bg); border-color: var(--agentic-border); } .approach-content h4 { font-size: 14px; font-weight: 600; color: var(--text-bright); margin-bottom: 4px; } .approach-content p { font-size: 13px; color: var(--text); line-height: 1.6; } .approach-content .verdict { margin-top: 6px; font-size: 12px; font-family: 'IBM Plex Mono', monospace; color: var(--text-dim); } /* ── EXPLANATION TEXT BLOCKS ── */ .explanation { background: var(--surface); border: 1px solid var(--border); border-radius: 12px; padding: 24px 28px; margin: 24px 0; line-height: 1.75; } .explanation h3 { font-family: 'Outfit', sans-serif; font-size: 18px; font-weight: 700; color: var(--text-bright); margin-bottom: 12px; } .explanation p { font-size: 14.5px; color: var(--text); margin-bottom: 12px; } .explanation p:last-child { margin-bottom: 0; } .explanation strong { color: var(--text-bright); font-weight: 600; } /* ── LOOP INDICATOR ── */ .loop-indicator { display: flex; align-items: center; gap: 10px; padding: 12px 20px; background: rgba(245,158,11,0.06); border: 1px dashed var(--agent-border); border-radius: 10px; margin: 8px 0 8px 56px; font-family: 'IBM Plex Mono', monospace; font-size: 12px; color: var(--agent-color); } .loop-indicator.agentic-loop { background: rgba(236,72,153,0.06); border-color: var(--agentic-border); color: var(--agentic-color); } .loop-arrow { font-size: 16px; } /* ── FOOTER ── */ .page-footer { text-align: center; padding: 40px 24px; border-top: 1px solid var(--border); font-size: 12px; color: var(--text-dim); } @media (max-width: 700px) { .layer-label { width: 80px; font-size: 9px; } .layer-block { padding: 14px 16px; } .layer-block h3 { font-size: 15px; } .scenario-approach { flex-direction: column; gap: 8px; } .flow-rail { width: 40px; } .flow-card { padding: 14px 16px; } .loop-indicator { margin-left: 40px; } } </style> </head> <body> <!-- ══════ HERO ══════ --> <div class="hero"> <h1>How AI <span>Actually Works</span></h1> <p>Visual workflows showing how LLM, RAG, AI Agent, and Agentic AI process a request from start to finish — with real-world scenarios that show when each approach shines.</p> </div> <!-- ══════ NAV ══════ --> <div class="nav" id="nav"> <button onclick="show('llm')" id="btn-llm">① LLM</button> <button onclick="show('rag')" id="btn-rag">② RAG</button> <button onclick="show('agent')" id="btn-agent">③ AI Agent</button> <button onclick="show('agentic')" id="btn-agentic">④ Agentic AI</button> <button onclick="show('layers')" id="btn-layers">⑤ Layer Map</button> <button onclick="show('scenario')" id="btn-scenario">⑥ Scenarios</button> </div> <!-- ══════════════════════════════════════════════════ --> <!-- SECTION 1: LLM WORKFLOW --> <!-- ══════════════════════════════════════════════════ --> <div class="section active" id="sec-llm"> <div class="section-head"> <div class="section-badge" style="background:var(--llm-color)">1</div> <div> <h2>LLM Workflow</h2> <p>Single-pass text generation — question in, answer out</p> </div> </div> <div class="explanation"> <h3>What's happening under the hood</h3> <p>A Large Language Model is the <strong>foundation</strong> of every other AI pattern on this page. When you type a question, the LLM tokenizes your input (breaks it into sub-word pieces), runs it through dozens of transformer layers that compute attention across the entire context, then predicts the most likely next token — one at a time — until it finishes its response.</p> <p>There is <strong>no retrieval, no planning, no tool use</strong>. The model generates entirely from patterns it learned during training. This makes it fast and versatile, but also means it can confidently produce information that sounds right but is factually wrong (hallucination), and it has no access to information after its training cutoff date or to any private data.</p> </div> <div class="flow"> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>1 · User Input</h3> <p>The human types a prompt — a question, instruction, or creative request. This is the only input the model receives.</p> <div class="detail"><span>Example:</span> "Explain how photosynthesis works in simple terms"</div> </div> </div> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>2 · Tokenization</h3> <p>The text is split into tokens (sub-word units). The word "photosynthesis" might become ["photo", "syn", "thesis"]. Each token is mapped to a numerical ID the model can process.</p> <div class="detail"><span>Technical:</span> BPE or SentencePiece tokenizer · Typical vocab size: 32K–128K tokens</div> </div> </div> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>3 · Transformer Processing</h3> <p>Tokens pass through the transformer's attention layers. Each layer computes relationships between every token and every other token, building a rich understanding of context, meaning, and intent.</p> <div class="detail"><span>Technical:</span> Self-attention + feed-forward networks · 32–128+ layers · Billions of parameters</div> </div> </div> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>4 · Token-by-Token Generation</h3> <p>The model predicts the most probable next token, appends it to the sequence, then predicts the next, and the next — auto-regressively building the full response one token at a time.</p> <div class="detail"><span>Technical:</span> Softmax over vocabulary · Temperature/top-p sampling controls randomness</div> </div> </div> <div class="flow-step flow-llm"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line" style="background:transparent"></div> </div> <div class="flow-card"> <h3>5 · Output Delivered</h3> <p>The completed text is returned to the user. The model has no memory of this interaction — the next conversation starts from scratch unless the full history is sent again.</p> <div class="detail"><span>Latency:</span> ~0.5–5 seconds · <span>Cost:</span> Lowest (single API call)</div> </div> </div> </div> </div> <!-- ══════════════════════════════════════════════════ --> <!-- SECTION 2: RAG WORKFLOW --> <!-- ══════════════════════════════════════════════════ --> <div class="section" id="sec-rag"> <div class="section-head"> <div class="section-badge" style="background:var(--rag-color)">2</div> <div> <h2>RAG Workflow</h2> <p>Retrieve, then generate — grounding answers in real documents</p> </div> </div> <div class="explanation"> <h3>What's happening under the hood</h3> <p>Retrieval-Augmented Generation inserts a <strong>knowledge retrieval step</strong> before the LLM generates its answer. Instead of relying solely on what the model memorized during training, RAG searches your actual documents, finds the most relevant passages, and feeds them into the LLM's prompt as context. The LLM then generates a response grounded in that real information.</p> <p>This is the most effective way to <strong>reduce hallucination</strong> and give the model access to private, current, or domain-specific data without retraining. The trade-off: answer quality depends heavily on whether the retrieval step found the right documents. If it retrieves irrelevant chunks, the answer will be grounded in the wrong information.</p> </div> <div class="flow"> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>Pre-step · Index Your Documents</h3> <p>Before any queries happen, your documents are prepared: split into chunks, converted to vector embeddings by an embedding model, and stored in a vector database. This only happens once (plus updates as documents change).</p> <div class="detail"><span>Tools:</span> LangChain / LlamaIndex for chunking · OpenAI / Cohere embeddings · Pinecone / Chroma / Weaviate for storage</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>1 · User Query</h3> <p>The user asks a question. Unlike a plain LLM, this question will first be used to search for relevant information before any text is generated.</p> <div class="detail"><span>Example:</span> "What is our company's parental leave policy for contractors?"</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>2 · Embed the Query</h3> <p>The question is converted into a vector embedding using the same embedding model that was used to index the documents. This creates a mathematical representation of the question's meaning.</p> <div class="detail"><span>Technical:</span> Same embedding model as indexing · Output: high-dimensional vector (e.g., 1536 dimensions)</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>3 · Vector Search & Retrieval</h3> <p>The query vector is compared against all document vectors in the database using similarity search (typically cosine similarity). The top-K most relevant document chunks are retrieved. Advanced systems add re-ranking to further prioritize the best matches.</p> <div class="detail"><span>Technical:</span> Approximate Nearest Neighbor search · Top-K (usually 3–10 chunks) · Optional re-ranking (Cohere Rerank, cross-encoder)</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>4 · Augment the Prompt</h3> <p>The retrieved document chunks are inserted into the LLM's prompt as context, typically with an instruction like "Answer the question based only on the following context." The original question is appended after the context.</p> <div class="detail"><span>Technical:</span> Context window management · Chunk compression if context exceeds limits</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>5 · LLM Generates Grounded Response</h3> <p>The LLM generates its answer using the retrieved context. Because it's working from actual documents rather than just training memory, the answer is more accurate and can include source citations pointing back to specific documents.</p> <div class="detail"><span>Latency:</span> ~1–8 seconds total · <span>Output:</span> Answer + source references</div> </div> </div> <div class="flow-step flow-rag"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line" style="background:transparent"></div> </div> <div class="flow-card"> <h3>6 · Deliver with Citations</h3> <p>The user receives an answer grounded in their actual documents, with references to exactly which sources informed the response. Trust is higher because the provenance is traceable.</p> <div class="detail"><span>Quality gate:</span> If retrieval returns low-confidence matches, system can flag uncertainty instead of guessing</div> </div> </div> </div> </div> <!-- ══════════════════════════════════════════════════ --> <!-- SECTION 3: AI AGENT WORKFLOW --> <!-- ══════════════════════════════════════════════════ --> <div class="section" id="sec-agent"> <div class="section-head"> <div class="section-badge" style="background:var(--agent-color)">3</div> <div> <h2>AI Agent Workflow</h2> <p>Plan, act, observe, reflect — an autonomous reasoning loop</p> </div> </div> <div class="explanation"> <h3>What's happening under the hood</h3> <p>An AI Agent wraps an LLM with <strong>autonomy</strong>. Instead of answering a question in one shot, the agent receives a <strong>goal</strong>, then enters a loop: it reasons about what to do next, selects and uses a tool (web search, code execution, API call, file access), observes the result, and decides whether the task is complete or needs more steps.</p> <p>This is the key shift: the LLM is no longer just generating text — it's <strong>making decisions and taking actions</strong>. The most common pattern is <strong>ReAct</strong> (Reason + Act), where the model alternates between thinking ("I need to search for the latest quarterly data") and acting (calling a search tool). The agent can handle errors, retry with different approaches, and self-correct — capabilities no static LLM or RAG system has.</p> </div> <div class="flow"> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>1 · Receive Goal</h3> <p>The user provides a high-level objective, not just a question. The agent needs to figure out how to accomplish it.</p> <div class="detail"><span>Example:</span> "Research the top 5 competitors in our market, find their pricing, and create a comparison spreadsheet"</div> </div> </div> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>2 · Plan</h3> <p>The LLM breaks the goal into a sequence of steps. It identifies what tools it will need, what information it must gather, and in what order to proceed. The plan may be explicit (written out) or implicit (decided step-by-step).</p> <div class="detail"><span>Pattern:</span> Chain-of-thought → Task decomposition → Tool selection</div> </div> </div> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>3 · Select Tool & Act</h3> <p>The agent chooses the most appropriate tool for the current step and executes it. This might be a web search, API call, code execution, database query, file read/write, or email send — whatever the task requires.</p> <div class="detail"><span>Tools available:</span> Web search · Code interpreter · File system · APIs · Calculators · RAG retrieval · Email</div> </div> </div> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>4 · Observe Result</h3> <p>The tool returns a result. The agent reads and interprets the output — was the search helpful? Did the code run without errors? Is the data complete or does it need more?</p> <div class="detail"><span>Technical:</span> Tool output injected back into LLM context as "observation"</div> </div> </div> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>5 · Reflect & Decide</h3> <p>The agent evaluates: Is the goal complete? Do I need more information? Did something fail that I need to retry? Should I adjust my approach? This self-reflection is what makes agents adaptive.</p> <div class="detail"><span>Pattern:</span> ReAct (Reason + Act) · Self-critique · Error recovery logic</div> </div> </div> </div> <div class="loop-indicator"> <span class="loop-arrow">↻</span> <span>Steps 3–5 repeat in a loop until the agent decides the goal is complete (or hits a maximum iteration limit)</span> </div> <div class="flow"> <div class="flow-step flow-agent"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line" style="background:transparent"></div> </div> <div class="flow-card"> <h3>6 · Deliver Final Output</h3> <p>Once the agent determines the task is complete, it assembles and delivers the final result — which might be a file, a report, a sent email, completed code, or a summary of actions taken.</p> <div class="detail"><span>Latency:</span> 30 sec – several minutes · <span>LLM calls:</span> 5–50+ per task · <span>Cost:</span> 5–20× a single LLM call</div> </div> </div> </div> </div> <!-- ══════════════════════════════════════════════════ --> <!-- SECTION 4: AGENTIC AI WORKFLOW --> <!-- ══════════════════════════════════════════════════ --> <div class="section" id="sec-agentic"> <div class="section-head"> <div class="section-badge" style="background:var(--agentic-color)">4</div> <div> <h2>Agentic AI Workflow</h2> <p>A coordinated team of specialized agents tackling complex projects</p> </div> </div> <div class="explanation"> <h3>What's happening under the hood</h3> <p>Agentic AI is the <strong>orchestration layer</strong> — multiple specialized AI agents working together like a team. Each agent has a defined role (researcher, writer, reviewer, coder), its own set of tools, and access to shared memory. An <strong>orchestrator agent</strong> (or manager) decomposes the overall goal into subtasks and assigns them to the right specialist.</p> <p>The key advantage is <strong>specialization and parallel execution</strong>. A researcher agent can gather data while a writer agent starts drafting based on earlier findings. A reviewer agent checks quality. If something fails, the orchestrator can reassign the task to a different agent or have the original agent retry with new instructions. This mirrors how real human teams work — and can handle projects that would overwhelm any single agent.</p> </div> <div class="flow"> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>1 · Complex Goal Received</h3> <p>A high-level objective that requires multiple distinct capabilities — research, analysis, writing, coding, review — arrives from the user or a triggering system.</p> <div class="detail"><span>Example:</span> "Produce a full competitive analysis report: research 10 competitors, analyze their products, pricing and market positioning, write the report with charts, and prepare a slide deck"</div> </div> </div> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>2 · Orchestrator Decomposes & Plans</h3> <p>The manager/orchestrator agent breaks the goal into a task graph — identifying which subtasks exist, their dependencies, which can run in parallel, and which agent role is best suited for each.</p> <div class="detail"><span>Output:</span> Task graph with assignments → Research Agent (gather data) → Analysis Agent (process data) → Writer Agent (draft report) → Chart Agent (create visuals) → Reviewer Agent (quality check)</div> </div> </div> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>3 · Agents Execute in Parallel / Sequence</h3> <p>Each specialist agent works on its assigned subtask using its own tools, LLM, and domain-specific instructions. Agents that don't depend on each other can run simultaneously. Each agent follows its own internal Reason → Act → Observe loop.</p> <div class="detail"><span>Parallel:</span> Research agents searching different competitors simultaneously<br><span>Sequential:</span> Writer waits for research to finish before drafting</div> </div> </div> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>4 · Shared Memory & Communication</h3> <p>As agents complete subtasks, their outputs are written to shared memory — a common state store all agents can read. The research agent's findings become available to the writer; the writer's draft becomes available to the reviewer. Agents can also send messages to request clarification from each other.</p> <div class="detail"><span>Technical:</span> Shared state store (Redis, SQLite, vector DB) · Message queues · Event-driven triggers</div> </div> </div> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line"></div> </div> <div class="flow-card"> <h3>5 · Review & Validation</h3> <p>Dedicated reviewer agents evaluate the work of other agents — checking for accuracy, completeness, consistency, and quality. If an output doesn't meet standards, it's sent back to the responsible agent with specific feedback for revision.</p> <div class="detail"><span>Pattern:</span> Peer review · Iterative debate · Consensus checking · Human-in-the-loop approval gates</div> </div> </div> </div> <div class="loop-indicator agentic-loop"> <span class="loop-arrow">↻</span> <span>Steps 3–5 repeat: orchestrator monitors progress, reassigns failed tasks, triggers dependent tasks when prerequisites complete</span> </div> <div class="flow"> <div class="flow-step flow-agentic"> <div class="flow-rail"> <div class="flow-dot"></div> <div class="flow-line" style="background:transparent"></div> </div> <div class="flow-card"> <h3>6 · Assemble & Deliver</h3> <p>Once all subtasks pass review, the orchestrator assembles the final deliverable — combining outputs from all agents into a cohesive package. The user receives the completed project.</p> <div class="detail"><span>Latency:</span> Minutes to hours · <span>LLM calls:</span> 50–500+ · <span>Cost:</span> 20–100× a single LLM call · <span>Output:</span> Full project deliverables</div> </div> </div> </div> </div> <!-- ══════════════════════════════════════════════════ --> <!-- SECTION 5: LAYER MAP --> <!-- ══════════════════════════════════════════════════ --> <div class="section" id="sec-layers"> <div class="section-head"> <div class="section-badge" style="background:var(--accent)">⧉</div> <div> <h2>The Layer Map</h2> <p>How these four paradigms stack on top of each other</p> </div> </div> <div class="explanation"> <h3>They're not alternatives — they're layers</h3> <p>The biggest misconception is that you have to <strong>choose one</strong>. In practice, these paradigms <strong>stack</strong>. An Agentic AI system contains multiple AI Agents. Each Agent uses an LLM as its reasoning engine. Many of those Agents use RAG to access knowledge bases. The LLM is the foundation everything else is built on.</p> <p>Think of it like building construction: the <strong>LLM is the foundation</strong> (structural intelligence). <strong>RAG adds plumbing</strong> (connecting to external data sources). An <strong>Agent adds a worker</strong> who can use tools and make decisions. <strong>Agentic AI is the whole construction crew</strong> — multiple specialized workers coordinating to build the complete structure.</p> </div> <div class="layers-stack"> <!-- Layer 4: Agentic AI --> <div class="layer-row"> <div class="layer-label" style="color:var(--agentic-color)">Layer 4<br>Orchestration</div> <div class="layer-block layer-4"> <h3>Agentic AI</h3> <p>Multi-agent orchestration. Decomposes complex goals across teams of specialized agents. Adds shared memory, inter-agent communication, parallel execution, review cycles, and project-level coordination.</p> <div class="layer-tech">CrewAI · AutoGen · LangGraph · MetaGPT · OpenAI Swarm</div> </div> </div> <div class="connector-bar"><div class="cbar"></div></div> <!-- Layer 3: Agent --> <div class="layer-row"> <div class="layer-label" style="color:var(--agent-color)">Layer 3<br>Autonomy</div> <div class="layer-block layer-3"> <h3>AI Agent</h3> <p>Wraps the LLM with agency: planning, tool use, observation loops, self-reflection, and error recovery. Transforms a text generator into an autonomous worker that can interact with the world.</p> <div class="layer-tech">ReAct · Function calling · MCP protocol · Code execution · API integration</div> </div> </div> <div class="connector-bar"><div class="cbar"></div></div> <!-- Layer 2: RAG --> <div class="layer-row"> <div class="layer-label" style="color:var(--rag-color)">Layer 2<br>Knowledge</div> <div class="layer-block layer-2"> <h3>RAG</h3> <p>Connects the LLM to external knowledge. Adds a retrieval step that searches vector databases, document stores, or live data sources to ground responses in real, current, verifiable information.</p> <div class="layer-tech">Pinecone · Chroma · Weaviate · FAISS · LlamaIndex · LangChain · Embedding models</div> </div> </div> <div class="connector-bar"><div class="cbar"></div></div> <!-- Layer 1: LLM --> <div class="layer-row"> <div class="layer-label" style="color:var(--llm-color)">Layer 1<br>Intelligence</div> <div class="layer-block layer-1"> <h3>LLM</h3> <p>The core reasoning engine. Understands language, generates text, follows instructions, and performs inference. Every layer above depends on this foundation. Without the LLM, nothing else works.</p> <div class="layer-tech">GPT-4 · Claude · Gemini · Llama · Mistral · Transformer architecture · Attention mechanism</div> </div> </div> </div> <div class="explanation" style="margin-top:32px;"> <h3>Common real-world stacking patterns</h3> <p><strong>LLM alone →</strong> ChatGPT answering general questions, GitHub Copilot suggesting code, a chatbot writing marketing copy. Fast, cheap, creative — but no access to your data and prone to hallucination.</p> <p><strong>LLM + RAG →</strong> Enterprise help desk that answers questions using your company's internal documentation. Perplexity AI searching the web and citing sources. A legal research tool that retrieves relevant case law before generating analysis.</p> <p><strong>LLM + RAG + Agent →</strong> A coding assistant that reads your codebase (RAG), plans a fix (planning), writes the code (generation), runs the tests (tool use), and iterates until all tests pass (reflection loop). Claude with computer use is an example — it can browse, search, write files, and execute code.</p> <p><strong>LLM + RAG + Multi-Agent →</strong> A research team where one agent searches academic papers (RAG), another agent searches financial data (RAG + API tools), a writer agent synthesizes findings into a report, and a reviewer agent fact-checks everything before delivery. CrewAI and AutoGen enable this pattern.</p> </div> </div> <!-- ══════════════════════════════════════════════════ --> <!-- SECTION 6: SCENARIOS --> <!-- ══════════════════════════════════════════════════ --> <div class="section" id="sec-scenario"> <div class="section-head"> <div class="section-badge" style="background:var(--accent)">?</div> <div> <h2>Same Task, Four Approaches</h2> <p>See how each paradigm handles the same real-world scenario — and which is the right fit</p> </div> </div> <div class="scenario-grid"> <!-- Scenario 1 --> <div class="scenario-card"> <div class="scenario-header"> <div class="scenario-icon" style="background:rgba(99,102,241,0.15)">📋</div> <div> <h3>Scenario: "What's our refund policy for enterprise clients?"</h3> <div class="scenario-subtitle">An employee needs an answer from internal company documentation</div> </div> </div> <div class="scenario-body"> <div class="scenario-approach"> <div class="approach-badge llm">LLM</div> <div class="approach-content"> <h4>Generates a plausible-sounding answer from training data</h4> <p>The LLM has never seen your company's internal documents. It will generate a generic refund policy that sounds reasonable but may be completely wrong for your specific company. There's no way for it to know your actual terms.</p> <div class="verdict">⚠ Risk: Hallucinated policy details could lead to contractual issues</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge rag">RAG ✓</div> <div class="approach-content"> <h4>Searches your policy documents and returns the exact answer with citations</h4> <p>RAG embeds the question, searches your vector database of company policies, retrieves the specific refund policy section, and generates a precise answer citing the exact document, section number, and effective date.</p> <div class="verdict">✅ Best fit — this is a knowledge retrieval task. RAG is built for exactly this.</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agent">Agent</div> <div class="approach-content"> <h4>Overkill — would work but adds unnecessary complexity and cost</h4> <p>An agent could search your docs, but for a single Q&A lookup, the planning and tool-selection overhead adds latency and cost without benefit. There are no multi-step actions needed here.</p> <div class="verdict">⚡ Works but wasteful — like hiring a project manager to answer one question</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agentic">Agentic</div> <div class="approach-content"> <h4>Massively overkill — deploying a team for a single lookup</h4> <p>Spinning up multiple specialized agents for a simple document lookup would be like convening a board meeting to check a fact. Expensive, slow, and completely unnecessary.</p> <div class="verdict">🚫 Wrong tool for the job</div> </div> </div> </div> </div> <!-- Scenario 2 --> <div class="scenario-card"> <div class="scenario-header"> <div class="scenario-icon" style="background:rgba(99,102,241,0.15)">📊</div> <div> <h3>Scenario: "Research our top 5 competitors and create a pricing comparison spreadsheet"</h3> <div class="scenario-subtitle">A strategist needs multi-step research and file creation</div> </div> </div> <div class="scenario-body"> <div class="scenario-approach"> <div class="approach-badge llm">LLM</div> <div class="approach-content"> <h4>Generates a comparison from training data — likely outdated and unverifiable</h4> <p>Can produce a comparison table based on what it learned during training, but pricing data changes constantly. The output might reflect last year's pricing or entirely fabricated numbers. Cannot actually create a spreadsheet file.</p> <div class="verdict">⚠ Outdated data, no file creation, no ability to verify current pricing</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge rag">RAG</div> <div class="approach-content"> <h4>Could answer if competitor pricing was already in your knowledge base</h4> <p>If you've already collected and indexed competitor pricing data, RAG can retrieve and present it. But it can't go find new data, visit competitor websites, or create spreadsheet files. It only works with what's already indexed.</p> <div class="verdict">⚠ Only works if the data is already collected — can't gather new information</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agent">Agent ✓</div> <div class="approach-content"> <h4>Researches competitors, gathers pricing, and builds the spreadsheet autonomously</h4> <p>The agent plans the task: (1) search the web for each competitor's pricing page, (2) extract pricing tiers and features, (3) organize the data into a structured comparison, (4) generate a spreadsheet file. It handles errors (page not found, data format changes) and iterates until complete.</p> <div class="verdict">✅ Best fit — multi-step task requiring research, tool use, and file creation</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agentic">Agentic</div> <div class="approach-content"> <h4>Would work well but may be more than needed for 5 competitors</h4> <p>A multi-agent team could parallelize the research (one agent per competitor), but for just 5 competitors, the orchestration overhead likely doesn't pay off. Better suited if the scope expanded to 50 competitors or added deeper analysis.</p> <div class="verdict">⚡ Viable if scope is large enough to justify the coordination overhead</div> </div> </div> </div> </div> <!-- Scenario 3 --> <div class="scenario-card"> <div class="scenario-header"> <div class="scenario-icon" style="background:rgba(99,102,241,0.15)">🚀</div> <div> <h3>Scenario: "Launch a full marketing campaign for our new product — research, copy, images, ads, and scheduling"</h3> <div class="scenario-subtitle">A marketing director needs a complete cross-functional campaign</div> </div> </div> <div class="scenario-body"> <div class="scenario-approach"> <div class="approach-badge llm">LLM</div> <div class="approach-content"> <h4>Can draft individual pieces of copy but can't execute anything</h4> <p>A plain LLM can brainstorm campaign ideas, write ad copy, suggest headlines — but it can't research the market, generate images, schedule posts, or coordinate across channels. You'd need to manually prompt it dozens of times and handle all execution yourself.</p> <div class="verdict">⚠ Useful as a writing assistant within a larger manual workflow</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge rag">RAG</div> <div class="approach-content"> <h4>Could inform the campaign with brand guidelines and past campaign data</h4> <p>RAG could retrieve your brand voice guidelines, past successful campaigns, audience research, and product specifications to ensure consistency — but it's a knowledge source, not an executor. It can't write, design, or schedule.</p> <div class="verdict">⚠ Valuable as a component within an agent system, not as the primary approach</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agent">Agent</div> <div class="approach-content"> <h4>Could handle pieces sequentially but would struggle with the full scope</h4> <p>A single agent could research the market, then write copy, then try to generate images — but it would work through everything sequentially. The task is broad enough that a single agent's context window and focus would be stretched thin. Quality degrades on later steps.</p> <div class="verdict">⚠ Possible but slow, and quality drops as task complexity increases</div> </div> </div> <div class="scenario-approach"> <div class="approach-badge agentic">Agentic ✓</div> <div class="approach-content"> <h4>Deploys a team: researcher, copywriter, designer, scheduler, reviewer</h4> <p>The orchestrator decomposes the campaign into parallel workstreams: a researcher agent analyzes the market and audience, a copywriter agent creates messaging (using RAG to reference brand guidelines), a designer agent generates visual assets, a scheduler agent plans the content calendar, and a reviewer agent checks everything for consistency and quality before launch.</p> <div class="verdict">✅ Best fit — complex, cross-functional project requiring multiple specialized skills working in coordination</div> </div> </div> </div> </div> </div> <div class="explanation" style="margin-top:36px;"> <h3>The decision rule is simple</h3> <p><strong>If you need a quick answer or creative text →</strong> LLM. It's the fastest and cheapest option for anything that doesn't require verified facts or actions.</p> <p><strong>If the answer must come from specific documents →</strong> RAG. Whenever accuracy matters and the information exists in a knowable corpus, retrieval-augmented generation is the right pattern.</p> <p><strong>If you need actions taken, not just text produced →</strong> AI Agent. Any task that requires multiple steps, tool use, web browsing, file creation, or API calls needs an agent.</p> <p><strong>If the project needs a team, not just one worker →</strong> Agentic AI. When the scope involves multiple distinct skill sets, parallel workstreams, and quality review — that's when you bring in the multi-agent system.</p> <p>And remember: in production, these <strong>stack together</strong>. The best agentic systems use RAG for knowledge retrieval, agents for autonomous execution, and the LLM as the brain powering everything.</p> </div> </div> <!-- ══════ FOOTER ══════ --> <div class="page-footer"> AI Workflow Reference · February 2026 </div> <script> function show(id) { document.querySelectorAll('.section').forEach(s => s.classList.remove('active')); document.querySelectorAll('.nav button').forEach(b => { b.className = ''; }); document.getElementById('sec-' + id).classList.add('active'); document.getElementById('btn-' + id).classList.add('active-' + id); // Re-trigger animations const section = document.getElementById('sec-' + id); section.querySelectorAll('.flow-step, .layer-row, .scenario-card').forEach(el => { el.style.animation = 'none'; el.offsetHeight; // trigger reflow el.style.animation = ''; }); } // Initialize show('llm'); </script> </body> </html>