Why do we need AI Agents?

In this post I provide some reasons for why AI Agents are a useful paradigm for developing LLM-based applications that can tackle increasingly complex problems. Read on to learn more!

Nov 24, 2024

Poster image created by author - via MidJourney

2024 has been a prolific year for AI Agents. For instance many open source projects centered on AI Agents have exploded in popularity. For instance, LangChain have released LangGraph and LangGraph Cloud, while CrewAI are going full steam ahead with CrewAI Plus.

Many startups (e.g. Wordware, Lyndi, etc.) that employ AI Agents natively as part of their product offering have raised solid funding rounds to keep building exciting applications that could upend entire industries.

Amidst this exuberance, I feel it is important that we ignore the noise for a moment, take a couple of steps back, and try to answer important questions about AI Agents. This is what this article aims to do.

In this brief post, we will first provide an explanation for what AI Agents are. Then we will dive into why AI Agents are critical to unlock more capabilities from LLMs. Finally, we will briefly touch on how AI Agents are concretely implemented.

What is an AI Agent?

Defining AI Agent requires us to also provide a definition for the term “Agentic System”. We will start with the latter, and then expand on the former.

Agentic systems are software architectures that use multiple AI agents to solve complex problems. Those problems are initially broken down into subtasks by the Planning module within an Agentic system. From there, AI Agents work together by individually executing tasks they are assigned. The system overall tracks progress on all these tasks, determines if additional tasks must be completed, and decides whether the original problem has been solved.

Diagram of an Agentic System - created by the author

AI Agents are the core components that make up an Agentic system. There are many definitions out there for what an (AI) Agent is, but I’ll still chance my own definition for the term.

To me, an AI Agent is an instance of a Large Language Model that has been given agency. This means that this instance of a LLM has been seeded with a specific role, clear instructions and backstory, and goals. Those goals represent the tasks the agent must complete. Completing those tasks, within the context of an Agentic system, contributes towards the attainment of the overarching goal.

Why do we need AI Agents?

A valid and important question we should all ask ourselves is “Why are AI Agents needed?” . To answer this question I will use the principle of Inversion and instead tackle the question “Why not always use a single instance of a LLM , ala ChatGPT?” . There are a couple of issues with using a single instance of a LLM when it comes to tackling certain problems.

Weaknesses of single non-agentic LLM systems

The first issue revolves around the limited context window that an instance from a LLM affords you. Granted, some LLMs like Gemini offer a 1M+ token context window, but most other providers limit their context window to 100K-200K tokens. This limitation means that you may run out of tokens when trying to solve a complex tasks that require extensive chat exchanges with a LLM.

Additionally, high input token usage can lead to degraded performance from the LLM in a given exchange. Degraded performance could mean slower response times from a given LLM API or incorrect information returned due to higher likelihood of hallucinations. While this can occur in both multi-agent and single LLM setup, you are more likely to use more token for a single conversation in a single LLM setup than with a multi-agent system, where the tasks are distributed to multiple agents.

One other glaring weakness when using a single instance of a LLM is the lack of specialisation available from the AI. Specialisation is important because it enables us to employ the most suitable LLM or agent in order to complete a given task. To me, specialisation can be achieved either by using a different underlying model - from the same provider or another - that is optimised for tackling the task at hand. Specialisation can also be more simply achieved through prompt tuning (system and user prompts) when we configure an agent.

Finally, custom tool use (also referred to as function calling), is a key capability that is missing from online LLM chat interfaces like Claude or ChatGPT. Tool use enables agents to move beyond pure text generation to actually performing actions. Concretely, tools extend the capabilities of a given LLM by giving it the ability to perform API calls, read/write files, browse the web, and much more. Some tasks cannot be successfully completed without the LLM having access to tools.

Beyond a model's raw intelligence, tools determine how powerful and capable an AI Agent really is. Even the most "intelligent" model out there would be severely limited in what it can achieve without access to the right tools.

Analogies from the real world

Another mental model I invoke to answer the question "Why do we need AI Agents" is by asking myself why we need specialised roles in companies. For instance, in a software company, why do we need developers, designers, product managers, marketers, salespeople, so on? Why could a single person not perform all these functions? Well, there may exist such unicorns out there... but this is rare.

Example from ChatDev - a multi-agent AI system that writes software applications

In most cases, a single individual would not be able to handle so much load across so many different functions. This enables us a draw an interesting parallel with a single LLM whose context window would become saturated from so many tasks from different functions.

Moreover, no individual has the same levels of depth of knowledge and experience in so many fields combined (e.g. engineering, design, sales, marketing, etc.). Put another way, some people are more proficient in one domain and less so in others. This is the result of specialisation. And it is normal. A company needs the best individuals in each of these functions in order to maximise its chances of succeeding. Similarly, AI Agents configured with different underlying models and input prompts would do well in some domains and less so in others. Yet for complex tasks, combining these agents maximises our chances of completing those tasks.

Limitations of AI Agents

Despite all these benefits, AI Agents are far from a panacea. I touch on some points in my other posts titled More Speed is what AI Agents Need and AI Agents Software 2.0. Additionally, AI Agents haven't been adopted across the enterprise due a plethora of issues.

Firstly, AI Agents are currently not reliable enough for mission critical work. For instance, enterprises require ~99.9% reliability of their software. Agents do not always successfully complete the tasks they have been assigned. In some cases, they can fail catastrophically and silently.

This lack of reliability is due to an agent's unpredictable behaviour. Unlike traditional software, an agent's behaviour isn't deterministic. This means that we can't easily identify possible execution paths and build consistent test suites. For instance, subtle changes in the input or hyper-parameters (e.g. temperature setting) can lead to significant output differences. Evaluation frameworks for agentic systems are still in their early days. Additionally, few people currently have acquired comprehensive experience building robust evals with LLMs. Most of us are beginners in this field.

High latency is another big issue that AI Agents are currently facing. With traditional software, users may consider anything above 100 milliseconds as already slow . Meanwhile, LLM calls in agents can take more than 30 seconds. This slowness creates a significant gap between user expectations and actual performance.

User experience considerably suffers as a result of the high latency associated with Agentic workflows. End users are used to fast software; they developed high standards in this area. But most AI Agent powered applications currently run too slowly. One solution some applications have adopted is to execute asynchronous workflows, where they for instance notify users via email once their AI Agents have completed the task at hand.

There are many other issues that arise from the use of AI Agents (and LLMs in general). Those cover areas such as data privacy and application security. Prompt engineering is another murky discipline. It is still considered a dark art for many, simply because it is more of an art than a science. It lacks the precision engineers are accustomed to.

Prompt engineering requires lots of trial and errors; it is an iterative process that needs continuous refinement. Additionally, models can refuse to conform to supplied schemas (e.g. malformed JSON), invent data structures, and use uneven formatting. In sum, prompt engineering is erratic and immature.

The subset of limitations above explain why many companies - with enterprise leading the charge - are reluctant to fully adopt AI Agents, particularly for critical business operations.

Wrapping up

AI Agents are an exciting paradigm that leverage existing LLMs to achieve superior performance on complex problems. Unlike systems that use a single instance of a LLM, multi-agent systems help us mitigate issues around limited context length and degraded performance as more tokens fill up the context window.

Additionally, the ability to employ more specialised models in an agentic system makes them more apt at tackling a wide array of problems that a system with a single instance of a LLM (e.g. ChatGPT) could not do. Finally, tool use is the icing on the cake for multi-agent systems, enabling them to extend their capabilities to perform API calls and a plethora of other actions beyond just answering with text.

But AI Agents are not without their fair share of challenges. While the underlying models keep improving and the tooling space around AI Agents keeps expanding, they are still very unreliable in many use cases. Moreover, latency in multi-agent system is another issue, where agentic calls may take seconds or minutes, thereby negatively affecting user experience.

Testing agentic system still remains a challenge too since their output is not deterministic. While excitement for Generative AI and AI Agents hasn’t waned, companies are still cautious on using this new technology due to the reliability, latency, and privacy concerns we touched on in this post.

We’re still early in this new technological wave, and I am confident that we will keep chipping away at these hurdles to enable the ubiquitous adoption of AI Agents in the business world.

I recently launched Kiseki Labs, a consultancy helping businesses implement GenAI through workshops, strategic advisory, and custom solutions. If you're interested in working together, you can book a free consultation at kisekilabs.com or connect with me on LinkedIn.

Eddie's startup voyage

Discussion about this post