Multi-Agent Systems | AI Agents Course

The Problem with Monolithic Agents

Imagine you need an agent that researches a topic, writes an article, edits for grammar, fact-checks claims, and formats for publication.

You could put all of this into one agent with a huge instruction set. But that creates problems:

Hard to debug (which step failed?)
Hard to improve (change one thing, break another)
Unreliable (too many responsibilities)
Expensive (huge context = more tokens)
Inflexible (can't reuse components)

+-------------------------------------------------------------+ | THE MONOLITHIC AGENT PROBLEM | | --------------------------------- | | | | instructions = """ | | You are a researcher, writer, editor, fact-checker, | | and formatter. When asked to write an article: | | 1. First research the topic thoroughly... | | 2. Then write a draft... | | 3. Then edit your draft... | | 4. Then fact-check all claims... | | 5. Then format for publication... | | ... (500 more lines of instructions) | | """ | +-------------------------------------------------------------+

The Multi-Agent Solution

Instead of one agent doing everything, create a team of specialists. Each agent has one job and does it well.

Benefits:

Each agent has ONE clear job
Easy to test individually
Easy to improve one without breaking others
Cheaper (smaller contexts per agent)
Reusable (use same editor in different pipelines)

This is the same principle as microservices vs monoliths in software engineering.

+-------------------------------------------------------------+ | THE MULTI-AGENT SOLUTION | | | | +---------------+ | | | Coordinator | <-- Decides who does what | | +-------+-------+ | | | | | +-----+-----+ | | | | | | | v v v | | +---+ +---+ +---+ | | | R | | W | | E | R = Researcher | | +---+ +---+ +---+ W = Writer | | E = Editor | +-------------------------------------------------------------+

Agent Delegation Pattern

The most common pattern: one agent calls another as a tool. The "main" agent coordinates, and "specialist" agents do specific tasks.

Notice:

Main agent uses expensive model (gpt-4o) for coordination
Calculator uses cheap model (gpt-4o-mini) for simple task
This saves money while maintaining quality

delegation.py

from pydantic_ai import Agent, RunContext

# Specialist agent - does ONE thing well
calculator_agent = Agent(
    'openai:gpt-4o-mini',  # Cheaper model for simple task
    instructions='You are a calculator. Return only the numeric result.',
)

# Main agent - coordinates everything
main_agent = Agent(
    'openai:gpt-4o',
    instructions='You help with various tasks. Use the calculator for math.'
)

@main_agent.tool
async def calculate(ctx: RunContext[None], expression: str) -> str:
    """Calculate a mathematical expression."""
    result = await calculator_agent.run(
        f'Calculate: {expression}',
        usage=ctx.usage  # Track combined token usage
    )
    return result.output

# Use the system
result = main_agent.run_sync('What is 15% of 250?')
print(result.output)
# Output: 15% of 250 is 37.5

Research + Summarization System

A more practical example with two specialists:

research_summarize.py

from pydantic_ai import Agent, RunContext

# Agent 1: Researcher - finds information
research_agent = Agent(
    'openai:gpt-4o',
    instructions='''You are a research specialist.
    Find 2-3 key facts about the given topic.
    Be thorough but concise.'''
)

@research_agent.tool_plain
def web_search(query: str) -> str:
    """Search the web for information."""
    return f"Search results for '{query}': [relevant information here]"

# Agent 2: Summarizer - creates summaries
summarizer_agent = Agent(
    'openai:gpt-4o-mini',
    instructions='''You create concise summaries.
    Format as 3-5 bullet points.
    Keep it simple and clear.'''
)

# Coordinator agent
coordinator = Agent(
    'openai:gpt-4o',
    instructions='Coordinate research and summarization tasks.'
)

@coordinator.tool
async def research_topic(ctx: RunContext[None], topic: str) -> str:
    """Research a topic thoroughly."""
    result = await research_agent.run(f'Research: {topic}', usage=ctx.usage)
    return result.output

@coordinator.tool
async def summarize_text(ctx: RunContext[None], text: str) -> str:
    """Summarize the given text."""
    result = await summarizer_agent.run(f'Summarize:\n{text}', usage=ctx.usage)
    return result.output

# Use it
result = coordinator.run_sync('Research AI agents and give me a summary')
print(result.output)

Pattern 1: Sequential (Pipeline)

When to use: Steps depend on each other. Output of step 1 is input to step 2.

Example: Content creation pipeline where you cannot write without research, and cannot edit without a draft.

+----------+ +----------+ +----------+ | Research | --> | Write | --> | Edit | +----------+ +----------+ +----------+ | | | v v v facts draft final article

Sequential Pipeline Code

sequential.py

from pydantic_ai import Agent

# Three specialist agents
researcher = Agent('openai:gpt-4o',
    instructions='Research the topic. Provide key facts.')

writer = Agent('openai:gpt-4o',
    instructions='Write a clear article based on research.')

editor = Agent('openai:gpt-4o',
    instructions='Edit for clarity and grammar. Return improved version.')

async def content_pipeline(topic: str) -> str:
    # Step 1: Research
    research_result = await researcher.run(f'Research: {topic}')
    research = research_result.output

    # Step 2: Write (needs research)
    article_result = await writer.run(
        f'Write an article based on this research:\n{research}'
    )
    article = article_result.output

    # Step 3: Edit (needs draft)
    final_result = await editor.run(f'Edit this article:\n{article}')

    return final_result.output

# Use it
import asyncio
article = asyncio.run(content_pipeline('The future of AI'))
print(article)

Pattern 2: Parallel

When to use: Steps are independent. They do not need each other's output.

Example: Company analysis where market, technical, and risk analyses are independent.

Why parallel is faster: If each analysis takes 5 seconds:

Sequential: 5 + 5 + 5 = 15 seconds
Parallel: max(5, 5, 5) = 5 seconds

+--------------+ +--->| Market |---+ | | Analysis | | | +--------------+ | +-----+ | +--------------+ | +---------+ |Input|--+--->| Technical |---+->| Combined| +-----+ | | Analysis | | | Report | | +--------------+ | +---------+ | +--------------+ | +--->| Risk |---+ | Analysis | +--------------+

Parallel Pattern Code

parallel.py

from pydantic_ai import Agent
import asyncio

# Three analysts
market_analyst = Agent('openai:gpt-4o',
    instructions='Analyze market trends. Be specific with numbers.')

tech_analyst = Agent('openai:gpt-4o',
    instructions='Analyze technical aspects. Focus on innovation.')

risk_analyst = Agent('openai:gpt-4o',
    instructions='Identify potential risks. Be thorough.')

async def parallel_analysis(company: str) -> dict:
    # Run all three IN PARALLEL using asyncio.gather
    results = await asyncio.gather(
        market_analyst.run(f'Market analysis for {company}'),
        tech_analyst.run(f'Technical analysis for {company}'),
        risk_analyst.run(f'Risk analysis for {company}')
    )

    return {
        'market': results[0].output,
        'technology': results[1].output,
        'risks': results[2].output
    }

# Much faster than running one by one
analysis = asyncio.run(parallel_analysis('Tesla'))
print(analysis)

Pattern 3: Loop (Iterative Refinement)

When to use: You need to improve output until it meets a quality threshold.

Example: Writing that needs multiple drafts, code that needs to pass tests, designs that need approval.

+----------+ +----------+ | Writer |<--->| Critic | +----------+ +----------+ | | | Feedback | +----------------+ Repeat until approved

Loop Pattern Code

loop.py

from pydantic import BaseModel
from pydantic_ai import Agent

class Review(BaseModel):
    approved: bool
    feedback: str

writer = Agent('openai:gpt-4o',
    instructions='Write or improve content based on feedback.')

critic = Agent('openai:gpt-4o',
    output_type=Review,
    instructions='''Review the content critically.
    Set approved=True ONLY if it is excellent.
    Otherwise, give specific feedback for improvement.'''
)

async def iterative_writing(topic: str, max_rounds: int = 3) -> str:
    # Initial draft
    result = await writer.run(f'Write about: {topic}')
    content = result.output

    for round_num in range(max_rounds):
        # Get critique
        review_result = await critic.run(f'Review this:\n{content}')
        review = review_result.output

        if review.approved:
            print(f'Approved after {round_num + 1} round(s)')
            return content

        print(f'Round {round_num + 1}: {review.feedback}')

        # Improve based on feedback
        result = await writer.run(
            f'Improve this based on feedback.\n\n'
            f'Content: {content}\n\n'
            f'Feedback: {review.feedback}'
        )
        content = result.output

    print('Max rounds reached')
    return content

# Use it
import asyncio
final_content = asyncio.run(iterative_writing('Benefits of meditation'))

Choosing the Right Pattern

Use this decision tree to pick the right pattern:

Fixed Pipeline (A -> B -> C) - Steps depend on each other → Use SEQUENTIAL
Concurrent Tasks (Run A, B, C at once) - Steps are independent → Use PARALLEL
Iterative Refinement (A <-> B) - Need to improve until good enough → Use LOOP
Dynamic Decisions (Let LLM decide) - Don't know in advance which agents to call → Use DELEGATION

Combining Patterns: Real systems often combine patterns. For example:

Sequential[
    Parallel[Research1, Research2, Research3],
    Loop[Writer, Critic],
    Editor
]

This researches 3 topics in parallel, writes and refines in a loop, then does a final edit.

Key Takeaways

1One job per agent. Keep agents simple and focused. A "research agent" only researches.
2Choose pattern by dependency: Steps depend on each other? Sequential. Steps are independent? Parallel. Need quality improvement? Loop. Don't know in advance? Delegation.
3Parallel = faster. Independent tasks should always run in parallel.
4Loops = quality. When output quality matters, use a critic to iteratively improve.
5Combine patterns. Real systems use multiple patterns together.