Build AI Agents From Scratch (Without LangChain): Complete Step-by-Step Tutorial#

Most developers who want to build AI agents reach for LangChain or a similar framework on day one. That is understandable — abstractions are convenient. But convenience has a cost. When something breaks inside a multi-step reasoning agent, and it will break, you need to know exactly what is happening under the hood. This tutorial shows you how to build AI agents from scratch using nothing but the OpenAI API and plain Python, so you understand every component before you ever reach for a framework.

Why Build an AI Agent Without Frameworks#

Frameworks like LangChain abstract away the agent loop, tool routing, memory management, and error handling. For prototyping, that speed is valuable. For production systems, that opacity is dangerous.

Building a custom AI agent implementation forces you to make explicit decisions about how your agent reasons, what tools it can call, how it handles errors, and when it stops. Those decisions have enormous consequences for reliability and cost, and frameworks often make them for you silently.

There is also a practical learning argument. Developers who understand LLM agent architecture at the implementation level debug faster, optimize more effectively, and design better systems. The internal mechanics are not complicated once you see them clearly, and this tutorial is designed to make them visible.

How AI Agents Work Internally#

Before writing code, it helps to build a mental model of how AI agents work internally. An agent is not a monolithic program. It is a loop.

At each iteration of the loop, the model receives a prompt that includes the user's goal, a list of available tools, and the history of what has happened so far. The model then either calls a tool to gather more information or produces a final answer. If it calls a tool, the result is added to the conversation history and the loop runs again. This continues until the model decides the task is complete.

This pattern is called the planning loop or agent loop implementation. It is the core of every autonomous AI agent, regardless of whether it was built with a framework or from scratch. Understanding it at this level of detail is what separates developers who can build reliable agents from those who can only configure them.

The model's ability to select and invoke tools is what gives the agent its power. This is done through a mechanism called tool calling in LLM systems, also referred to as function calling. The model does not execute code directly. It outputs a structured description of which function to call and with what arguments, and your code does the actual execution.

Defining Your Tools#

The first concrete step in building an AI agent without LangChain is defining the tools your agent can use. In OpenAI's API, tools are described using JSON schemas that specify the function name, a description of what it does, and the parameters it expects.

Write each tool as a regular Python function first. A weather lookup tool might accept a city name and return current conditions. A calculator tool might accept a mathematical expression and return the evaluated result. A web search tool might accept a query string and return a list of result snippets. Keep each function focused on a single responsibility.

Once your Python functions are written, define the corresponding JSON schema for each one. The description field is critical — it is what the model reads to decide whether this tool is appropriate for a given step. Write descriptions that are precise and unambiguous. A vague description leads to incorrect tool selection, which is one of the most common sources of failure in multi-step reasoning agents.

Implementing the Agent Loop#

With your tools defined, you can build the agent loop. The loop is a while construct that runs until the model signals it is done. Each iteration follows the same sequence: send the current conversation to the model, inspect the response, and act on it.

In a function calling example using the OpenAI API, you pass the tool schemas in the tools parameter of your chat completions request. When the model decides to call a tool, the response will contain a tool call object rather than a plain text message. Your loop detects this, extracts the function name and arguments, executes the corresponding Python function, and appends the result to the conversation as a tool result message.

If the response contains a plain text message with no tool call, your loop treats that as the final answer and exits. This is the complete agent loop implementation in its simplest form. It is roughly thirty lines of Python and contains no dependencies beyond the OpenAI SDK.

The conversation history is the agent's working memory. Every tool call, every result, and every intermediate reasoning step accumulates in this list. The model can reference anything in that history at any point in the loop, which is what enables genuine multi-step reasoning rather than isolated single-turn responses.

Adding Error Recovery#

A production agent needs error recovery logic. Tool calls fail. APIs return unexpected formats. Arguments that looked valid turn out to be out of range. Without error handling, a single failure terminates the entire agent run.

Error recovery in AI agents takes two forms. The first is technical error handling in your tool execution code: wrap each function call in a try-except block and return a structured error message rather than raising an exception. The agent then sees that the tool returned an error and can decide how to respond, whether by retrying with different arguments, trying an alternative tool, or informing the user that the step could not be completed.

The second form is prompt-level guidance. Include instructions in your system prompt that tell the model how to behave when a tool returns an error. Explicit instructions like "if a tool call fails, try an alternative approach before giving up" produce significantly more resilient behavior than leaving the model to interpret errors on its own.

You should also implement a maximum iteration limit. An agent stuck in a reasoning loop that never resolves will continue calling tools and accumulating cost indefinitely. A hard cap of ten to twenty iterations, combined with a fallback message to the user, is a simple safeguard with significant practical value.

A Real-World Example: A Research Assistant Agent#

To make this concrete, consider building a simple research assistant. The agent has three tools: a web search function, a page fetch function that retrieves the full text of a URL, and a summarization function that condenses a long text to key points.

The user asks: "What are the main arguments for and against universal basic income, based on recent research?" The agent calls web search with a relevant query, receives a list of URLs, fetches the most relevant page, summarizes it, then searches again for the opposing perspective, fetches and summarizes that result, and finally synthesizes both summaries into a balanced answer.

This entire flow runs through the same thirty-line agent loop. No framework is required. The logic is explicit, the tool calls are traceable, and the error handling is entirely under your control. That transparency is exactly what makes this approach preferable for production AI orchestration logic.

Benefits of Building From Scratch#

The most immediate benefit is debuggability. When you implement the agent loop yourself, every variable is accessible, every decision is logged, and every failure has a traceable cause. That makes the difference between a thirty-minute debugging session and a three-day one.

The second benefit is cost control. Frameworks often make conservative choices about context window usage and retry behavior that result in unnecessary API calls. A custom implementation lets you optimize the conversation history aggressively, truncating or summarizing older turns to reduce token consumption without sacrificing reasoning quality.

The third benefit is portability. An agent built against the OpenAI function calling API can be adapted to any model that supports the same interface, including Anthropic's Claude, Google's Gemini, or locally hosted models via compatible APIs. Framework lock-in is a real migration cost that bare-metal implementations avoid entirely.

Limitations to Keep in Mind#

Building from scratch does require more upfront investment than reaching for a framework. You will write boilerplate that LangChain would have provided automatically, and you will need to implement things like conversation summarization and tool result formatting yourself if your use case requires them.

Complexity also scales with capability. A single-agent loop is simple. A system with multiple specialized agents that delegate tasks to each other, share memory, and coordinate through a shared state requires significantly more careful design. For those use cases, a lightweight framework or a well-structured internal library may be worth the abstraction cost.

The goal is not to avoid frameworks forever. It is to understand the mechanics well enough that you can evaluate framework choices critically and debug them confidently when they behave unexpectedly.

Conclusion#

Building AI agents from scratch is one of the highest-leverage skills a developer working with large language models can develop. The agent loop is not complex, the tool calling mechanism is well-documented, and the debugging advantages of a transparent implementation pay dividends immediately.

Start with a single tool and a single-purpose agent. Get the loop working, instrument the conversation history, and add error handling before expanding capability. Once you have built one agent from scratch, the internal architecture of every framework you encounter will make immediate sense, and you will be far better equipped to use them well or replace them when necessary.

The best way to understand how AI agents work internally is to build one yourself. Thirty lines of Python is all it takes to get started.