GitHub
Back to course
Lesson 5
45 min

Long-Term Memory

Store and retrieve memories across conversations

Sessions vs Memory

| Aspect | Session | Memory | |--------|---------|--------| | Scope | Single conversation | ALL conversations | | Duration | While chatting | Forever | | Content | Raw messages | Extracted facts | | Search | Sequential | Semantic | | Token cost | Grows linearly | Fixed per query |

Analogy:

  • Session = What you talked about in the current meeting
  • Memory = Your notes from all past meetings

Why Memory Matters

Sessions have limits:

  • One conversation only
  • Token costs grow with conversation length
  • Cannot reference past conversations

Memory solves this:

  • Persists across all conversations
  • Fixed cost to retrieve relevant memories
  • Agent can reference anything from the past

Simple Memory Store

A basic memory system with keyword search:

memory_store.py
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Memory:
    content: str
    timestamp: datetime
    category: str
    importance: float = 0.5  # 0 to 1

@dataclass
class MemoryStore:
    memories: list[Memory] = field(default_factory=list)

    def add(self, content: str, category: str, importance: float = 0.5):
        """Add a new memory."""
        self.memories.append(Memory(
            content=content,
            timestamp=datetime.now(),
            category=category,
            importance=importance
        ))

    def search(self, query: str, limit: int = 5) -> list[Memory]:
        """Search memories by keyword."""
        query_words = set(query.lower().split())
        scored = []

        for mem in self.memories:
            mem_words = set(mem.content.lower().split())
            overlap = len(query_words & mem_words)
            if overlap > 0:
                score = overlap * mem.importance
                scored.append((score, mem))

        scored.sort(reverse=True, key=lambda x: x[0])
        return [m for _, m in scored[:limit]]

    def get_by_category(self, category: str) -> list[Memory]:
        """Get all memories in a category."""
        return [m for m in self.memories if m.category == category]

Agent with Memory Tools

Give the agent tools to remember and recall across sessions:

agent_memory.py
from pydantic_ai import Agent, RunContext
from dataclasses import dataclass, field

@dataclass
class AgentContext:
    user_id: str
    memory: MemoryStore = field(default_factory=MemoryStore)

memory_agent = Agent(
    'openai:gpt-4o',
    deps_type=AgentContext,
    instructions='''You have long-term memory.
    Use remember_fact to store important information about the user.
    Use recall to find relevant memories before answering.
    Personalize responses based on what you remember.'''
)

@memory_agent.tool
def remember_fact(
    ctx: RunContext[AgentContext],
    fact: str,
    category: str,
    importance: float = 0.5
) -> str:
    """Store an important fact in long-term memory.

    Args:
        fact: The fact to remember
        category: Category (personal, preference, work, relationship)
        importance: How important (0.0 to 1.0)
    """
    ctx.deps.memory.add(fact, category, importance)
    return f"Remembered: {fact}"

@memory_agent.tool
def recall(ctx: RunContext[AgentContext], query: str) -> list[str]:
    """Search memories for relevant information.

    Args:
        query: What to search for
    """
    memories = ctx.deps.memory.search(query)
    if not memories:
        return ["No relevant memories found."]
    return [f"[{m.category}] {m.content}" for m in memories]

# Usage across multiple sessions
context = AgentContext(user_id="pablo")

# Session 1
result1 = memory_agent.run_sync(
    "I'm Pablo, I work in fintech in Dubai. I prefer dark mode.",
    deps=context
)

# Session 2 (later, new conversation)
result2 = memory_agent.run_sync(
    "What do you remember about me?",
    deps=context
    # No message_history - this is a NEW conversation
)
print(result2.output)
# Output: I remember you are Pablo, you work in fintech in Dubai,
#         and you prefer dark mode.

Automatic Memory Extraction

Instead of relying on the agent to save memories, automatically extract facts after each conversation:

auto_extract.py
from pydantic import BaseModel
from pydantic_ai import Agent

class ExtractedFacts(BaseModel):
    facts: list[str]
    categories: list[str]

extractor = Agent(
    'openai:gpt-4o-mini',
    output_type=ExtractedFacts,
    instructions='''Extract important facts from the conversation.
    Focus on: names, preferences, jobs, locations, relationships.
    Return each fact separately with its category.'''
)

async def auto_extract_memories(conversation: str, memory_store: MemoryStore):
    """Run after each conversation to extract and store memories."""
    result = await extractor.run(f'Extract facts from:\n{conversation}')

    for fact, category in zip(result.output.facts, result.output.categories):
        memory_store.add(fact, category, importance=0.7)

Vector Database for Production

For production, use a vector database for semantic search. Popular options:

  • Pinecone (managed)
  • Weaviate (self-hosted or managed)
  • ChromaDB (local, good for development)
  • pgvector (PostgreSQL extension)
vector_memory.py
# Conceptual example - actual implementation depends on your vector DB
class VectorMemoryStore:
    def __init__(self, embedding_model, vector_db):
        self.embedding_model = embedding_model
        self.vector_db = vector_db

    def add(self, content: str, metadata: dict):
        embedding = self.embedding_model.embed(content)
        self.vector_db.insert(embedding, content, metadata)

    def search(self, query: str, limit: int = 5) -> list[dict]:
        query_embedding = self.embedding_model.embed(query)
        return self.vector_db.search(query_embedding, limit=limit)
Key Takeaways
  • 1Memory is not the same as session. Sessions are one conversation. Memory spans all conversations.
  • 2Store facts, not raw messages. Extract important information, not entire chat logs.
  • 3Categories help organization. Personal, work, preferences, etc.
  • 4Search by relevance. When recalling, find the most relevant memories for the current context.
  • 5Use vector DBs in production. Semantic search finds relevant memories even with different wording.