Building a Decision-Making AI Agent with DSPy and Finite State Machines for Complex Queries

Recent advancements in language model architecture suggest that building agents with a multi-agent system could significantly enhance the performance of large language models (LLMs). According to the paper "More Agents Is All You Need" by Junyou Li and colleagues, the use of a sampling-and-voting method among multiple agents not only complements but also amplifies the capabilities of traditional LLM applications. This approach proves especially effective as the complexity of tasks increases, as demonstrated through extensive testing across various benchmarks.

The process of building these agents, however, presents certain challenges. Traditional methods may require either constructing agents from scratch — a labor-intensive process ensuring reliable internal and external communications — or opting for pre-built solutions that might not offer the necessary transparency or control, leading to potential inefficiencies and high operational costs.

More agents is all you need

To address these challenges, I explored using finite state machines (FSMs) as a framework for constructing decision-making agents. FSMs provide a clear structure for agent behavior, ensuring efficient operations and simplifying the management of state transitions. This structured approach allows the language model to focus on its strengths: serving as a communication interface and a knowledge database, while the FSM handles the logical flow and state management.

To explain my approach and the final code I developed, I’ve created a video. It’s available on my AI education-focused LMS platform, Lycee AI. Visit www.lycee.ai, create an account, and enroll in the DSPy course to access the full explanation and the code.

Learn more about the course and enroll here

For those interested in just the code without the detailed explanation, here it is:

import dspy
from dspy.functional import TypedPredictor
import os
from dotenv import load_dotenv
from transitions import Machine

load_dotenv()

llm = dspy.OpenAI(
    model='gpt-3.5-turbo',
    api_key=os.environ['OPENAI_API_KEY'],
    max_tokens=100
)

dspy.settings.configure(lm=llm)

class DecisionSignature(dspy.Signature):
    input_text = dspy.InputField(desc="The input text to be processed")
    rationale = dspy.OutputField(desc="The rationale for the decision")
    decision: bool = dspy.OutputField(desc="True if the input text contains the final answer, False otherwise")

class Agent(Machine):
    def __init__(self, llm, objective=None):
        self.llm = llm
        this.objective = objective
        self.memory = []
        states = ['start', 'thought', 'acted', 'observed', 'concluded']
        Machine.__init__(self, states=states, initial='start')
        self.add_transition('think', 'start', 'thought')
        self.add_transition('act', 'thought', 'acted')
        self.add_transition('observe', 'acted', 'observed')
        self.add_transition('decide', 'observed', ['start', 'concluded'])

    def think(self, prompt):
        response = self.llm(prompt).pop()
        self.memory.append(response)
        self.state = 'thought'

    def act(self, prompt):
        response = self.llm(prompt).pop()
        self.memory.append(response)
        self.state = 'acted'

    def observe(self, prompt):
        response = self.llm(prompt).pop()
        self.memory.append(response)
        self.state = 'observed'

    def decide(self, prompt):
        decision_maker = TypedPredictor(DecisionSignature)
        response = decision_maker(input_text=prompt)
        if response.decision:
            final_answer = self.llm(f"What is the final answer to this: {self.objective}, given this: {str_memory}").pop()
            self.state = 'concluded'
            return final_answer
        self.state = 'start'
        self.memory.append("Decision not reached because " + response.rationale)

agent = Agent(llm, objective="What is the double of the sum of Barack Obama and his wife's age in April 2024 ?")
agent.execute()