Building AI Multi-agent Workflows
Apr 8, 2025
It's amazing the rate at which AI large language models (LLMs) are evolving. Every few months we're getting new LLMs with enhanced capabilities and at cheaper costs. ChatGPT, Gemini, Claude, Grok, DeepSeek—the list is also expanding. In The Coming Wave, Mustafa Suleyman gives us a good analogy. He likens these rapid advancements in AI to the Cambrian explosion—a period marked by a swift expansion of complex life on Earth millions of years ago.
Thanks to these enhanced models and new software tools, we can easily build multi-agent workflows. In a multi-agent workflow (or application), several AI agents work together to complete a given task. The agents participating in the workflow are AI-powered software capable of performing tasks autonomously. You can supply them with "tools" to perform tasks involving interactions with outside apps or the internet. Each agent can be specialized in a particular area. The result is an AI application capable of executing complex tasks with minimal or no human guidance.
Next, we'll take a look at how to build such an application using the multi-agent framework AutoGen. But first, I'd like to briefly touch on what distinguishes an "AI multi-agent application" from traditional software automation.
Table of contents
- AI or just automation?
- Multi-agent frameworks
- Multi-agent architecture design patterns
- Building the workflow
- Final considerations
AI or just automation?
It's important to point out what makes AI applications different from the rest. As I mentioned earlier, an AI agent is a piece of software that uses an LLM to complete tasks autonomously. If you think about it, this may not sound that special. After all, you can also accomplish a wide variety of automated tasks using scripts, such as bash or python scripts. However, there are characteristics in AI applications that can't be replicated with a script easily, if at all.
Software automations via scripts are well suited for repetitive and predictable tasks. The key point is that inputs and outputs in such tasks are deterministic or predictable. We automate them because they are repetitive and the desired outcomes are known beforehand. In fact, this predictability is often desired. Any deviation might raise an exception, log an error, or even crash the system. Certainly, you can use an AI agent to automate tasks for which a script would work just as good. However, using an AI agent to automate tasks which can be automated with a script is not the best use of an AI agent.
A few attributes make AI applications different. When working with LLMs, we can use natural language to prompt the model. A prompt can describe not just the task to be completed, but also how the AI is supposed to behave. The same model can behave differently based on how we prompt it. Often, the task description is loosely defined or open-ended. The output of an AI agent can also involve a degree of "creativity." For instance, tasks involving image generation, writing, or coding. Another characteristic is that LLMs are good at working with semantic context. I.e., they're able to extract the meaning of text, images, or sound to perform their tasks. AI agents are superior to automation scripts in these types of tasks. In fact, it would be quite challenging to automate some of them with a script.
So, when deciding whether to use AI agents for a given workflow, first ask yourself if they're really necessary. Remember that—although getting better—AI models are still computationally expensive to run.
Let's continue and see how to build a multi-agent workflow.
Multi-agent frameworks
As of today, we have a few multi-agent frameworks out there. These include AutoGen, LLamaIndex, LangGraph, OpenAI's Agents SDK, and others. In this post, I'll be showing you an example workflow using the AutoGen framework. I've been experimenting with it and find it quite flexible. If you've used OpenAI's Agents SDK, then you'll find AutoGen quite familiar too.
AutoGen is a multi-agent framework developed by researchers at the Microsoft Research division, and it’s an open-source project. There have been a few changes since last year. And, they added many new features starting with the version v0.4 release. Let's take a look at these features in more detail next.1
In a multi-agent application, agents communicate and interact with each other via messages. They perform actions based on the messages they receive. The AutoGen framework provides a communication infrastructure. Additionally, the framework manages the agent’s lifecycle, as well as enforces security and privacy boundaries. This infrastructure—in which the agents interact—is called the agent runtime. AutoGen supports two types of runtimes: a standalone runtime (which can be run locally) and a distributed agent runtime (which can host agents running on different processes or machines).
Both agent runtimes support two types of agent communication: direct messaging and broadcast. The latter is essentially a publish-subscribe messaging model. Agents publish messages to a specific topic, while the agents subscribed to that topic consume those messages.
Another feature is that agents can be created using any of the supported LLM model clients (e.g. OpenAI, Gemini, or models hosted on Ollama, Anthropic, Azure OpenAI, etc.). AutoGen provides an abstraction called RoutedAgent, which we can instantiate with the model client of our choosing. We could use different LLM models for different agents, depending on the task the agent performs. This, combined with the publish-subscribe model, allows us to build highly flexible and scalable multi-agent applications. Other capabilities include adding logging and telemetry to our multi-agent application.
We can orchestrate our workflows by taking advantage of these framework features. Also, a few multi-agent design patterns exist to help tackle complexity. As we'll see next, these are general patterns that may be useful in common use cases.
Multi-agent architecture design patterns
When it comes to multi-agent applications2, these design patterns are based on how agents communicate and interact with each other. For instance, in the group chat pattern, a group of agents take turns sending messages to each other in a common message thread. A group chat manager agent determines which agent "speaks" at each turn. By acting on these messages, each agent performs a specific task. It's possible to include an agent representing the human user (i.e. human-in-the-loop). This way the user can interact and provide feedback to the agents.
Another multi-agent pattern is handoff, introduced by OpenAI. Here, the idea is to let an agent delegate tasks to other agents using a tool (or function) call.
In the following example workflow, I used the sequential workflow pattern. As we'll see, in this pattern, each agent performs a specific task—by processing a message, creating a response, and passing it as a message to the next agent. In this case, the workflow involves executing a series of steps in a specific order.
There are a few other interesting patterns, which I encourage you to take a look at. But at this point we're ready to see an implementation of these concepts.
Building the workflow
The following is only a toy example, but it will help illustrate the concepts discussed earlier. It'll also help to motivate our discussion about agent tool usage and the Model Context Protocol (MCP)—which we'll talk about at the end.
Our multi-agent workflow provides a full report with a recommendation on whether to invest in a particular stock. Let's say I ask if it's a good idea to buy a particular public company's stock. The agents in the workflow will search and analyze the public company's financial records. Then based on an analysis of this data, a final recommendation is made. This can be a complex and involved process, but I wanted to keep it simple this time. The workflow consists of three major steps:
-
Retrieving the latest financial statements filed by public companies to regulatory agencies. In this case, one of the agents retrieves these financial records from the SEC (Securities and Exchange Commission) for a given US company.
-
Once the relevant financial data is obtained and extracted, another agent analyzes the information. This agent is prompted as a financial analyst, and it makes an assessment of the price of the company’s stock based on its financial fundamentals. It tries to evaluate potential gains (or losses), if we were to invest.
-
A third agent writes a report with a recommendation for the user. The final output is a detailed report explaining the reasoning behind the financial recommendation.
Disclaimer: do not take any of this content as financial advice. For illustrative purposes only.
And now that you have an overview of the workflow, let's look at our architecture and communication strategy.
Workflow architecture
Because each of the steps in our workflow needs to be completed in a specific order, we'll be using the sequential workflow design pattern. We have three different agents, each one in charge of one of the tasks listed above. For the underlying agent's LLM models, we have an Azure OpenAI client with the gpt-4o model. In this case, all agents use the same LLM model. But, recall that AutoGen allows us to use a mix of model clients. This is quite powerful, as we could use a different LLM model based on the model's capabilities. For instance, we know that some models are better than others at coding, others are better at writing, etc. We can use the best model for the task at hand—enhancing our overall application's performance.
The following diagram shows our workflow's architecture. The agents in the sequence are the DataExtractorAgent, the FinancialAnalystAgent, and the RecommenderAgent. Notice that the DataExtractorAgent is able to use tools to retrieve financial information from the public internet.
Workflow architecture diagram: there's three agents, one of which is able to use tools to retreive information from the public internet.
Next, let's talk about agent communication. For our learning purposes, I chose to use the broadcast model supported by AutoGen3. As we saw earlier, this is a publish-subscribe communication model. The agents use topics and subscriptions to publish and retrieve messages. Each agent is subscribed to a topic, where it fetches messages from. After an agent completes a task, it publishes its output message to the corresponding topic, where the next agent in the workflow will read it. The last agent in the workflow (RecommenderAgent) doesn’t publish its output to a topic, but rather to the application’s output.
Agent communication diagram: agents subscribe to topics and publish messages after completing a task to continue with the workflow execution.
In the diagram above, you can see that we have three topics: data_extractor_topic, financial_analyst_topic, and recommender_topic. Each agent is subscribed to its corresponding topic, where it retreives and processes messages from.
Now that we have a clear picture of our multi-agent workflow, let's continue with its implementation using AutoGen.
Workflow implementation
In this section, we'll take a look at the implementation of our workflow. Not all code will be shown here, but you can find all the python source code in my GitHub repository here. Keep in mind that this implementation is specific to the AutoGen framework, and others may be slightly different.
For our use case, we'll be using AutoGen's Core package. In general, to create an agent we need to:
- Define an agent class that derives from AutoGen's RoutedAgent class.
- Implement the
message_handler
method for each message type that the agent will handle.
In our workflow, the DataExtractorAgent is the only one that's equipped with a tool (used to get financial data from the internet). So let's take a look at it first. I'll break it down into two parts, first let's see the class constructor. Then we'll check the message_handler implementation.
@type_subscription(topic_type=data_extractor_topic_type)
class DataExtractorAgent(RoutedAgent):
def __init__(self, model_client: ChatCompletionClient) -> None:
super().__init__("A financial data extractor agent.")
self._system_messages: List[LLMMessage] = [SystemMessage(
content=(
"You are a corporate financial data analyst.\n"
"You will be given the ticker symbol of a company, and you are to obtain the most recent cash flow statement in its 10-K filing from the SEC for that company.\n"
"Use any tool available to you to obtain the data.\n"
"The data you obtain might be in JSON format, you are to identify and extract the most important financial metrics from the filing.\n"
"Once you have extracted this information, provide a detailed report including the name of the company, the ticker symbol, and the extracted financial metrics."
)
)]
self._model_client = model_client
self._tools: List[Tool] = [get_financial_data_tool]
Constructor method for the DataExtractorAgent class
I'd like to point out a few things here. The _system_messages
class member is initialized with a prompt to set the agent's expected behavior. Then we initialize the _model_client
with an LLM model client passed as an argument to the constructor. Lastly, we provide a list of tools to the _tools
class member. This is a list containing python functions, which have been wrapped in AutoGen's FunctionTool class and assigned to the get_financial_data_tool
variable (not shown here). I'll discuss tool usage later on, but this tool is a python script that I created to make a request to the SEC API and obtain financial information for a given company. One last thing to note is the type_subscription()
decorator. This lets us specify the topic type to which the agent subscribes—in this case, the data_extractor_topic_type
(see the Agent communication diagram in the previous section for more details).
As I mentioned, this is the only agent that's equipped with a tool. This is why we needed to initialize the _tools
class member. The other two agents in our workflow are instantiated in the same way. The difference is that we don't initialize the _tools
member. Also, the message_handler implementation requires a few extra steps in our tool-equipped DataExtractorAgent. Below is the message_handler
method implementation.
@message_handler
async def handle_outside_message(self, message: Message, context: MessageContext) -> None:
# Create session of messages
session: List[LLMMessage] = self._system_messages + [UserMessage(content=message.content, source="user")]
# Run chat completion with the tool
llm_result = await self._model_client.create(
messages=session,
tools=self._tools,
cancellation_token=context.cancellation_token
)
# Add the result to the session
session.append(AssistantMessage(content=llm_result.content, source=self.id.key))
# Execute the tool calls
results = await asyncio.gather(*[self._execute_tool_call(call, context.cancellation_token) for call in llm_result.content])
# Add the function execution results to the session
session.append(FunctionExecutionResultMessage(content=results))
# Run the chat completion again to reflect on the history and function execution results.
llm_result = await self._model_client.create(
messages=session,
cancellation_token=context.cancellation_token
)
response = llm_result.content
assert isinstance(response, str)
print(f"{'='*80}\n[{self.id.type}]:\n{response}\n")
await self.publish_message(Message(response), topic_id=TopicId(financial_analyst_topic_type, source=self.id.key))
message_handler implementation for the DataExtractorAgent
This method is implemented using the decorator message_handler
provided by the RoutedAgent class. In this case, we implement the method just once, since we're handling only one message type. Notice that the method takes a message
argument of type Message
. This is a python dataclass which we defined as our message type. As shown below, it contains a single field named content
.
@dataclass
class Message:
content: str
Any message of type Message
will be routed to our message_handler method handle_outside_message
. Since we're using a tool, our agent's underlying LLM model needs to first process the user's initial message. It then determines if a tool call is needed. If so, it calls the tool and processes the tool's response. This is also called reflection, where the agent ponders on the tool execution result before generating a response. Finally, it generates the response and publishes it to the corresponding topic (again, refer to the Agent communication diagram in the previous section). To handle these multiple messages, notice that we created a list of elements of type LLMMessage called session
. The first message we added to the session was the UserMessage, followed by any subsequent messages in the process.
As I mentioned earlier, the other two agents in our workflow are implemented similarly. In fact, their implementation is somewhat simpler, since no tool use is needed. The next thing I want to show you is how to register our agents and initialize the runtime to start the workflow.
Once we have defined our agent classes, we need to make them available in AutoGen's runtime. To do this, we use the register()
method provided by the RoutedAgent class. In our workflow, I defined a method to do this and, once the agents are registered, start the runtime.
async def start_workflow(message : str) -> None:
model_client = get_azure_openai_chat_completion_client()
runtime = SingleThreadedAgentRuntime()
await DataExtractorAgent.register(
runtime,
type=data_extractor_topic_type,
factory=lambda: DataExtractorAgent(model_client=model_client)
)
await FinancialAnalystAgent.register(
runtime,
type=financial_analyst_topic_type,
factory=lambda: FinancialAnalystAgent(model_client=model_client)
)
await RecommenderAgent.register(
runtime,
type=recommender_topic_type,
factory=lambda: RecommenderAgent(model_client=model_client)
)
runtime.start()
await runtime.publish_message(
Message(content=message),
topic_id=TopicId(data_extractor_topic_type, source="default")
)
await runtime.stop_when_idle()
The start_workflow() method: registers the agents and starts the runtime.
Notice that we first create our model client, in this case, using an Azure OpenAI model (code not shown here). But, remember that AutoGen supports other model clients, such as Gemini, or models hosted on Anthropic or Ollama, etc. We then proceed to register each of the agents using the register()
method. As you can see, this method takes as parameters the—previously created—runtime, the topic type to which the agent subscribes, and a lambda function used to instantiate the agent. This function is passed to the factory parameter to create a factory method. The runtime will use this factory method to return instances of the agent class on which the register()
method is invoked.
We can now start the runtime. Then to kickstart our workflow, we publish the first message—obtained from user input—to the data_extractor_topic_type
. The runtime will stop when all messages have been processed by the agents.
Results
Finally, I'd like to show you some of the results I get when running the workflow. In the screenshot below, you can see the start of the workflow. I asked if it was a good idea to buy Apple stock.
And, the following screenshot shows the intermediate response by the DataExtractorAgent. You can see it obtained the financial information for Apple Inc. and summarized it. This is the same information that gets then passed to the FinancialAnalystAgent.
One interesting thing to notice is that gpt-4o's response above is formatted as markdown. This makes it quite convenient to work with if adding a UI to the application. Lastly, here's the final recommendation report provided by the RecommenderAgent.
I hope this multi-agent workflow example—although seemingly simple—helped to illustrate the concepts discussed in the previous sections. There are certainly many improvements we can make here. Let's take a look at some of the things we can do to improve our workflow. These are more general and can be used in any multi-agent workflow or application.
Potential improvements
The first improvement—and probably the most obvious one—we can make is adding a UI. An easy way to do this is by using a UI library like Streamlit. It lets you quickly add a ChatGPT-like interface to interact with the agents.
Regarding agent's performance, we can focus on each one in isolation and improve its system prompts. These are the instructions that tell the agent what it's supposed to do. It's well konwn that a good prompt can make a massive difference on how an agent performs. Additionally—since AutoGen allows us to use multiple LLM models—we can choose the best model for each agent, based on the task it performs.
Finally, I mentioned earlier that our toy example would help illustrate a discussion about agent tool usage and the Model Context Protocol. I'd like to start this discussion in the final sections of this post next. Its integration with our workflow also constitutes an important potential improvement.
Final considerations
AutoGen makes it easy to build multi-agent workflows—creating the agents was relatively easy. Surprisingly, the most time-consuming part was creating the financial data retrieval tool. Although it’s a simple API call to the SEC's API, the response data format is not the typical JSON. The SEC's API returns its response as XBLR, a special format based on XML, which makes it more complicated to parse. If you take a look at the source code, you'll notice I had to use a python library called sec-api for this.
By now, you know that a tool gives an agent the ability to interact with external data sources or other applications. Just as with the SEC's API, other data sources or applications may need special handling as well. This means that each tool we create for an agent (in the form of a function) may need to be customized. To address this and provide some standardization, Anthropic created the Model Context Protocol or MCP. AutoGen provides support to integrate MCP-based tools, which we could take advantage of in our workflow.
Now, to talk about MCP in detail, it would require its own blog post. So, I'll just give you a brief overview next.
The MCP (Model Context Protocol)
MCP is a protocol that tries to standardize the way LLM applications interact with external systems. These can be other applications, APIs, data sources, etc. The information obtained from these sources provides context to the LLM. MCP follows a client-server architecture, which lets a host—e.g. our tool-equipped LLM agent—connect to MCP servers via an MCP client. Each server connects to an external source, and a client can connect to multiple servers.
You can see here how a single MCP client, hosted by our multi-agent application, can have access to multiple data sources. This simplifies the implementation of our workflows while providing a nice decoupling between the tool implementation and the data source.
Within the past few months, MCP adoption has soared. And, many new MCP servers are available for a wide variety of applications, such as PostgreSQL, GitHub, YouTube, Google Drive, etc. It seems that MCP is, in fact, becoming the standard way to supply LLM agents with tools.
For a deeper dive into the Model Context Protocol, I recommend reading this excellent blog post and the official documentation here.
$$\cdot \cdot \cdot$$
I believe we're starting to see an acceleration in the adoption of AI agents in our applications and workflows. LLM base models continue to get better and cheaper to use. Better software tools make it easier to develop our AI applications. All these factors continue to fuel this acceleration. Maybe, we're indeed undergoing an AI Cambrian explosion, or is it more of an AI Big Bang?
Notes
-
Keep in mind that these features are specific to AutoGen, they might or might not be the same in other multi-agent frameworks. Also, features found in other frameworks might not be available in AutoGen. ↩
-
There's also single-agent application design patterns. A popular single-agent architecture is the ReAct pattern. Here, an agent running in a loop decides on which tools to use or actions to take, based on the task at hand. ↩
-
You may have noticed that in this workflow, using a direct messaging strategy would've been enough—since an agent can just pass a message directly to the next one in the workflow. But, I wanted to illustrate how AutoGen makes it quite simple to implement the broadcast model. Also, using the broadcast communication model, it would be easy to scale the application to handle analyzing multiple stocks. We can just instantiate several worker agents subscribed to each of the topics, each one processing the information in parallel. ↩