The Problem with Monolithic Agents
Imagine you need an agent that researches a topic, writes an article, edits for grammar, fact-checks claims, and formats for publication.
You could put all of this into one agent with a huge instruction set. But that creates problems:
- Hard to debug (which step failed?)
- Hard to improve (change one thing, break another)
- Unreliable (too many responsibilities)
- Expensive (huge context = more tokens)
- Inflexible (can't reuse components)
The Multi-Agent Solution
Instead of one agent doing everything, create a team of specialists. Each agent has one job and does it well.
Benefits:
- Each agent has ONE clear job
- Easy to test individually
- Easy to improve one without breaking others
- Cheaper (smaller contexts per agent)
- Reusable (use same editor in different pipelines)
This is the same principle as microservices vs monoliths in software engineering.
Agent Delegation Pattern
The most common pattern: one agent calls another as a tool. The "main" agent coordinates, and "specialist" agents do specific tasks.
Notice:
- Main agent uses expensive model (gpt-4o) for coordination
- Calculator uses cheap model (gpt-4o-mini) for simple task
- This saves money while maintaining quality
from pydantic_ai import Agent, RunContext
# Specialist agent - does ONE thing well
calculator_agent = Agent(
'openai:gpt-4o-mini', # Cheaper model for simple task
instructions='You are a calculator. Return only the numeric result.',
)
# Main agent - coordinates everything
main_agent = Agent(
'openai:gpt-4o',
instructions='You help with various tasks. Use the calculator for math.'
)
@main_agent.tool
async def calculate(ctx: RunContext[None], expression: str) -> str:
"""Calculate a mathematical expression."""
result = await calculator_agent.run(
f'Calculate: {expression}',
usage=ctx.usage # Track combined token usage
)
return result.output
# Use the system
result = main_agent.run_sync('What is 15% of 250?')
print(result.output)
# Output: 15% of 250 is 37.5Research + Summarization System
A more practical example with two specialists:
from pydantic_ai import Agent, RunContext
# Agent 1: Researcher - finds information
research_agent = Agent(
'openai:gpt-4o',
instructions='''You are a research specialist.
Find 2-3 key facts about the given topic.
Be thorough but concise.'''
)
@research_agent.tool_plain
def web_search(query: str) -> str:
"""Search the web for information."""
return f"Search results for '{query}': [relevant information here]"
# Agent 2: Summarizer - creates summaries
summarizer_agent = Agent(
'openai:gpt-4o-mini',
instructions='''You create concise summaries.
Format as 3-5 bullet points.
Keep it simple and clear.'''
)
# Coordinator agent
coordinator = Agent(
'openai:gpt-4o',
instructions='Coordinate research and summarization tasks.'
)
@coordinator.tool
async def research_topic(ctx: RunContext[None], topic: str) -> str:
"""Research a topic thoroughly."""
result = await research_agent.run(f'Research: {topic}', usage=ctx.usage)
return result.output
@coordinator.tool
async def summarize_text(ctx: RunContext[None], text: str) -> str:
"""Summarize the given text."""
result = await summarizer_agent.run(f'Summarize:\n{text}', usage=ctx.usage)
return result.output
# Use it
result = coordinator.run_sync('Research AI agents and give me a summary')
print(result.output)Pattern 1: Sequential (Pipeline)
When to use: Steps depend on each other. Output of step 1 is input to step 2.
Example: Content creation pipeline where you cannot write without research, and cannot edit without a draft.
Sequential Pipeline Code
from pydantic_ai import Agent
# Three specialist agents
researcher = Agent('openai:gpt-4o',
instructions='Research the topic. Provide key facts.')
writer = Agent('openai:gpt-4o',
instructions='Write a clear article based on research.')
editor = Agent('openai:gpt-4o',
instructions='Edit for clarity and grammar. Return improved version.')
async def content_pipeline(topic: str) -> str:
# Step 1: Research
research_result = await researcher.run(f'Research: {topic}')
research = research_result.output
# Step 2: Write (needs research)
article_result = await writer.run(
f'Write an article based on this research:\n{research}'
)
article = article_result.output
# Step 3: Edit (needs draft)
final_result = await editor.run(f'Edit this article:\n{article}')
return final_result.output
# Use it
import asyncio
article = asyncio.run(content_pipeline('The future of AI'))
print(article)Pattern 2: Parallel
When to use: Steps are independent. They do not need each other's output.
Example: Company analysis where market, technical, and risk analyses are independent.
Why parallel is faster: If each analysis takes 5 seconds:
- Sequential: 5 + 5 + 5 = 15 seconds
- Parallel: max(5, 5, 5) = 5 seconds
Parallel Pattern Code
from pydantic_ai import Agent
import asyncio
# Three analysts
market_analyst = Agent('openai:gpt-4o',
instructions='Analyze market trends. Be specific with numbers.')
tech_analyst = Agent('openai:gpt-4o',
instructions='Analyze technical aspects. Focus on innovation.')
risk_analyst = Agent('openai:gpt-4o',
instructions='Identify potential risks. Be thorough.')
async def parallel_analysis(company: str) -> dict:
# Run all three IN PARALLEL using asyncio.gather
results = await asyncio.gather(
market_analyst.run(f'Market analysis for {company}'),
tech_analyst.run(f'Technical analysis for {company}'),
risk_analyst.run(f'Risk analysis for {company}')
)
return {
'market': results[0].output,
'technology': results[1].output,
'risks': results[2].output
}
# Much faster than running one by one
analysis = asyncio.run(parallel_analysis('Tesla'))
print(analysis)Pattern 3: Loop (Iterative Refinement)
When to use: You need to improve output until it meets a quality threshold.
Example: Writing that needs multiple drafts, code that needs to pass tests, designs that need approval.
Loop Pattern Code
from pydantic import BaseModel
from pydantic_ai import Agent
class Review(BaseModel):
approved: bool
feedback: str
writer = Agent('openai:gpt-4o',
instructions='Write or improve content based on feedback.')
critic = Agent('openai:gpt-4o',
output_type=Review,
instructions='''Review the content critically.
Set approved=True ONLY if it is excellent.
Otherwise, give specific feedback for improvement.'''
)
async def iterative_writing(topic: str, max_rounds: int = 3) -> str:
# Initial draft
result = await writer.run(f'Write about: {topic}')
content = result.output
for round_num in range(max_rounds):
# Get critique
review_result = await critic.run(f'Review this:\n{content}')
review = review_result.output
if review.approved:
print(f'Approved after {round_num + 1} round(s)')
return content
print(f'Round {round_num + 1}: {review.feedback}')
# Improve based on feedback
result = await writer.run(
f'Improve this based on feedback.\n\n'
f'Content: {content}\n\n'
f'Feedback: {review.feedback}'
)
content = result.output
print('Max rounds reached')
return content
# Use it
import asyncio
final_content = asyncio.run(iterative_writing('Benefits of meditation'))Choosing the Right Pattern
Use this decision tree to pick the right pattern:
- Fixed Pipeline (A -> B -> C) - Steps depend on each other → Use SEQUENTIAL
- Concurrent Tasks (Run A, B, C at once) - Steps are independent → Use PARALLEL
- Iterative Refinement (A <-> B) - Need to improve until good enough → Use LOOP
- Dynamic Decisions (Let LLM decide) - Don't know in advance which agents to call → Use DELEGATION
Combining Patterns: Real systems often combine patterns. For example:
Sequential[
Parallel[Research1, Research2, Research3],
Loop[Writer, Critic],
Editor
]
This researches 3 topics in parallel, writes and refines in a loop, then does a final edit.
- 1One job per agent. Keep agents simple and focused. A "research agent" only researches.
- 2Choose pattern by dependency: Steps depend on each other? Sequential. Steps are independent? Parallel. Need quality improvement? Loop. Don't know in advance? Delegation.
- 3Parallel = faster. Independent tasks should always run in parallel.
- 4Loops = quality. When output quality matters, use a critic to iteratively improve.
- 5Combine patterns. Real systems use multiple patterns together.