Many of us hear the term “AI agent” thrown around. Some hail it as the future, while others fear it, predicting it will replace jobs. In this series, we will demystify AI agents by exploring them through a very popular framework: LangGraph.
This article will trace the journey from the earliest forms of automation to the incredible AI agents we see today—agents capable of handling highly complex tasks. We will also clarify the distinctions between LLMs, agents, and workflows, and discuss the critical question of when to use an AI agent and, just as importantly, when not to. Finally, we’ll survey the most popular frameworks available for building your own AI agent solutions.
The Dawn of Automation: Rule-Based Systems
Before we dive into the world of AI agents, let’s look at how this story began. Initially, when we wanted to automate a task, we relied on rule-based state machines. A classic example is the early Telegram bots. If you recall, these bots didn’t truly understand you. Instead, they presented a list of predefined options, and you would choose from that list. The system was a straightforward combination of simple tools and algorithms following a fixed path.
The Second Wave: The Rise of Large Language Models (LLMs)
The landscape changed dramatically with the advent of Large Language Models (LLMs). This phase took off after Google published its groundbreaking paper, “Attention Is All You Need,” in 2017. Companies like OpenAI invested heavily in this technology, creating powerful models starting with GPT-2 and GPT-3.
These models proved to be exceptionally powerful. The great advantage of LLMs is their profound ability to understand language. They are, at their core, robust language models. However, they came with significant limitations:
- No Memory: An LLM wouldn’t remember who you are or recall past conversations.
- No Action Capability: Its function was limited to generating text. It couldn’t perform actions.
These limitations paved the way for the third and current phase: the AI agent.
The Third Wave: AI Agents Take the Stage
AI agents harness the language-understanding power of LLMs and elevate it. They use this capability to perform tasks and take actions in the real world. An agent doesn’t just respond with text; it can execute a specific action, whatever that action may be.
LLM vs. Agent vs. Workflow: What’s the Difference?
Let’s break down these core concepts.
1. The Large Language Model (LLM)
An LLM is an AI model that takes text as input and produces text as output. It is purely a text-to-text generation engine. As we’ve noted, it has no memory, cannot use external tools, and cannot take independent actions. It doesn’t inherently know what your ultimate goal is; it just processes input and generates a textual response. Early versions of ChatGPT were like this—a simple chatbot that couldn’t perform actions. Today, it has evolved to incorporate agent-like capabilities and tools.
2. The AI Agent
AI agents use an LLM as their core “brain” to take specific actions and interact with their surrounding environment.
So, what does this mean? While an LLM is limited to text generation, an agent uses the LLM’s reasoning to achieve a concrete outcome. If an LLM only generates text, how do we get it to do more? We don’t want to just generate text about turning on a light; we want the agent to actually turn it on. We don’t want to just talk about sending an email; we want the agent to send it.
For these tasks, the output is an action, not text. This is where agents shine. An agent uses an LLM to interact with tools and its environment to achieve a goal. The LLM is the mind of the agent.
An agent typically operates in a cycle:
- Think and Plan: The agent receives the user’s request and begins to think. What does the user want? Do they want me to turn on the light? Send an email? It analyzes the request and outlines the necessary steps to fulfill it.
- Act: After creating a plan, the agent moves to the execution phase. It carries out the steps it has outlined. For example, if the user’s request was to “turn on the light,” the agent would first plan: “The user wants the light on. I will use the
open_lighttool.” Then, in the “Act” phase, it executes that tool. - Observe: Finally, the agent observes the result to ensure the user’s request was completed successfully.
The observation stage is crucial because LLMs are not deterministic. In traditional software, when we write an algorithm, we are certain of its output. LLMs are different. They can—and often do—”hallucinate” or produce incorrect output. The observation step is a vital feedback loop to overcome this non-deterministic nature and ensure the final outcome is correct and aligns with the user’s intent.
A famous example of an AI agent is Alexa. You can ask it to do almost anything: “Alexa, open the light,” “Alexa, open the door,” or “Alexa, show me my emails.” It thinks, plans, acts on that plan, and delivers the result. We also see many coding agents today, such as Manus and Bolt, which follow this same pattern: they think, create a plan, and then execute that plan step-by-step to write code.
In short, an AI agent uses an LLM to achieve a specific goal, and that goal is usually action-based, not just text generation. The agent doesn’t know what step it will take next; its path is not predetermined. It depends entirely on the user’s request. If the user wants to send an email, the agent thinks, “Okay, I’ll use the send_email tool.” If the user wants to turn on a light, it will use the open_light tool. The path is determined dynamically during the “Think and Plan” phase.
3. The Workflow
You can think of a workflow as a specialized type of agent. The key difference is that a workflow has a predefined path. When designing a workflow, the developer specifies a fixed sequence of steps that the system must follow.
This approach is typically used for tasks that are well-defined and not very flexible. The programmer wants to enforce a specific path to guarantee that the agent behaves in a predictable way and takes a certain set of actions.
So, the core distinction is:
- Agent: The LLM determines its own path dynamically.
- Workflow: The developer predefines the path for the agent to follow.
For simplicity, we can consider both to fall under the umbrella of AI agents.
When to Use an Agent (and When Not To)
Unfortunately, agents have a significant drawback: they consume a lot of requests.
If I use a standard LLM, I send one prompt (input) and receive one response (output). That’s a single request. With an agent, the process is much more involved.
- The Think and Plan step requires a request to the LLM. This often involves “deep thinking,” which consumes a large number of tokens.
- The Act step might require one or more additional requests to execute tools.
- The Observe step requires another request to verify the outcome.
In the simplest AI agent, you could easily consume four or more requests, compared to just one for a standard LLM call. This has a direct impact on cost, as API calls to services like GPT, Gemini, or Claude can be expensive. It also increases latency; a request, especially one involving deep thinking, takes time to process.
So, when should you not use AI agents? When you don’t need them.
If your task can be solved with a simple LLM call, you should absolutely use a simple LLM call. Don’t complicate things with an agent. It will add complexity, consume more requests, and cost you more money and time. If your task is simple and can be handled by a single LLM prompt, stick with that.
Popular Frameworks for Building AI Agents
Let’s look at some of the most popular frameworks for building agents.
n8n
This is one of the most visible frameworks, often seen in online tutorials. n8n is a no-code tool that allows you to connect blocks. You might have a WhatsApp node, an Email node, and a ChatGPT node. You connect these blocks, configuring each one with its respective API keys. It’s a visual, no-code tool for building your agent. As with any no-code tool, it’s primarily for hobbyists. If you’re a developer looking to build a serious system or a market-ready MVP, n8n won’t be a suitable choice.
CrewAI
CrewAI is very easy to use. As the name suggests, you build a “crew” of agents. The core idea is that for each agent (or node), you define its identity: its backstory, its role, and its specific task. For example, to create an agent that generates search queries, you might tell it: “You are a search query specialist with five years of experience.” After defining its persona, you assign it a task, such as: “Generate a search query for product X.”
Its advantages are its simplicity and its ability to handle edge cases automatically. For instance, if you ask the LLM for a JSON format and it fails to provide it, CrewAI will handle that error for you.
However, its main drawback is that it’s highly abstracted. You are essentially just writing prompts. You design your agent by describing what it does, what its task is, and how it passes its output to the next agent. This high level of abstraction makes it difficult to control. What if you want to add a node that doesn’t involve an LLM call? It’s difficult to do that in CrewAI. A major disadvantage is its high consumption of LLM requests. The input prompts can become very large, leading to high costs. If you use CrewAI, be sure to monitor your API usage and costs carefully.
LangGraph
This brings us to one of the most powerful and flexible frameworks: LangGraph. The idea behind LangGraph is that you build a graph of nodes. LangGraph itself simply provides the structure for building this graph. You define a graph with a starting node, specify what node comes next, and how nodes can branch.
You have complete freedom to implement the nodes however you wish. Inside a LangGraph node, you can do anything. You can make a request to GPT, use a local model, or not use an LLM at all. Your node could simply fetch data from an API. LangGraph’s flexibility is its greatest strength. It organizes the structure of your agent, but you control the implementation. This is the framework we will be exploring in this series.
Other Frameworks
Other notable frameworks include Microsoft AutoGen, OpenAI Assistants, and Smol Agent. Smol is a code-generation agent that takes a problem and writes the code to solve it.