The Promise of AI Agents

June 7, 2024


AI agents have been in vogue since the emergence of ChatGPT, especially those based on large language models. The idea is to use a Large Language Model (LLM) as a reasoning engine, powering a system that has access to certain tools (APIs, web browser) and can act autonomously or semi-autonomously. This concept has captivated many. Here is my analysis of the state of the art and future prospects.

A New Form of Automation

The classical form of automation generally involves a system making decisions based on predefined rules. Two conditions are necessary for automation to be effective: a set of clear rules and a stable environment. On Zapier, for example, you can automate simple tasks by creating "zaps" that trigger actions based on predefined conditions. It's simple and effective but only works for simple and repetitive tasks. With VBA or other scripting languages, you can also automate tasks, and those who have done it know how much it depends on the form of the input data. A poorly entered cell, and the whole script stops working.

In the classic form of automation, reasoning is externalized into rules. In the case of AI agents, reasoning is internalized within a language model. This is a fundamental difference. The problem is that language models are not designed to reason. They are designed to predict the next most likely word. They mimic human skills but possess no general intelligence. They are not ready to function without a human in the loop. So, what are the implications of this new form of automation that AI agents represent?

AI Agents and Stochastic Automation

As previously explained, AI agents have access to tools such as APIs, files, and web browsers. The idea is to use them to make decisions, autonomously or semi-autonomously. Language models perform interestingly on reasoning benchmarks like HellaSwag, but these scores are not sufficient to guarantee reliable decision-making. Language models are stochastic models, meaning they do not always give the same answer to the same question. This can be problematic for tasks requiring reliable decision-making. For example, if you use an AI agent to trade stocks, you do not want the agent to make random decisions. You want it to make decisions based on facts, data, rules.

Moreover, LLMs are deep learning models, known for being difficult to interpret. This poses a problem for tasks requiring an explanation of the decision made. For example, if you use an AI agent to evaluate candidates for a position, you want to be able to explain why the agent made a particular decision. Language models are not designed for this. They are made to generate text, not to explain decisions, which can have legal implications in some cases.

The possibility of combining LLM and tools is interesting in itself. It opens up new use cases and rethinks automation. On ChatGPT, it's practical to use web navigation or APIs to get more reliable answers through Retrieval Augmented Generation (RAG). Therefore, semi-autonomous uses are preferable for AI agents, while autonomous uses should be avoided unless you have blind trust in the language model.

The use cases where it would be interesting to have actions taken by a biased system whose internal rules we do not control would be those where the risk is low. For example, an AI agent that autonomously prepares a research dossier for a brainstorming session. But for use cases requiring interpretability and reliability, it is preferable to keep a human in the loop.


Generative AIs are fantastic tools that enhance productivity. I am personally convinced of this after having developed web applications based on these models, such as I am also a contributor to Stanford University's DSPy framework and have created a Ruby client to interact with Mistral AI models ( However, as with everything, moderation is key and they should be used wisely.