Long-Term Memory | AI Agents Course

Sessions vs Memory

| Aspect | Session | Memory | |--------|---------|--------| | Scope | Single conversation | ALL conversations | | Duration | While chatting | Forever | | Content | Raw messages | Extracted facts | | Search | Sequential | Semantic | | Token cost | Grows linearly | Fixed per query |

Analogy:

Session = What you talked about in the current meeting
Memory = Your notes from all past meetings

Why Memory Matters

Sessions have limits:

One conversation only
Token costs grow with conversation length
Cannot reference past conversations

Memory solves this:

Persists across all conversations
Fixed cost to retrieve relevant memories
Agent can reference anything from the past

Simple Memory Store

A basic memory system with keyword search:

memory_store.py

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Memory:
    content: str
    timestamp: datetime
    category: str
    importance: float = 0.5  # 0 to 1

@dataclass
class MemoryStore:
    memories: list[Memory] = field(default_factory=list)

    def add(self, content: str, category: str, importance: float = 0.5):
        """Add a new memory."""
        self.memories.append(Memory(
            content=content,
            timestamp=datetime.now(),
            category=category,
            importance=importance
        ))

    def search(self, query: str, limit: int = 5) -> list[Memory]:
        """Search memories by keyword."""
        query_words = set(query.lower().split())
        scored = []

        for mem in self.memories:
            mem_words = set(mem.content.lower().split())
            overlap = len(query_words & mem_words)
            if overlap > 0:
                score = overlap * mem.importance
                scored.append((score, mem))

        scored.sort(reverse=True, key=lambda x: x[0])
        return [m for _, m in scored[:limit]]

    def get_by_category(self, category: str) -> list[Memory]:
        """Get all memories in a category."""
        return [m for m in self.memories if m.category == category]

Agent with Memory Tools

Give the agent tools to remember and recall across sessions:

agent_memory.py

from pydantic_ai import Agent, RunContext
from dataclasses import dataclass, field

@dataclass
class AgentContext:
    user_id: str
    memory: MemoryStore = field(default_factory=MemoryStore)

memory_agent = Agent(
    'openai:gpt-4o',
    deps_type=AgentContext,
    instructions='''You have long-term memory.
    Use remember_fact to store important information about the user.
    Use recall to find relevant memories before answering.
    Personalize responses based on what you remember.'''
)

@memory_agent.tool
def remember_fact(
    ctx: RunContext[AgentContext],
    fact: str,
    category: str,
    importance: float = 0.5
) -> str:
    """Store an important fact in long-term memory.

    Args:
        fact: The fact to remember
        category: Category (personal, preference, work, relationship)
        importance: How important (0.0 to 1.0)
    """
    ctx.deps.memory.add(fact, category, importance)
    return f"Remembered: {fact}"

@memory_agent.tool
def recall(ctx: RunContext[AgentContext], query: str) -> list[str]:
    """Search memories for relevant information.

    Args:
        query: What to search for
    """
    memories = ctx.deps.memory.search(query)
    if not memories:
        return ["No relevant memories found."]
    return [f"[{m.category}] {m.content}" for m in memories]

# Usage across multiple sessions
context = AgentContext(user_id="pablo")

# Session 1
result1 = memory_agent.run_sync(
    "I'm Pablo, I work in fintech in Dubai. I prefer dark mode.",
    deps=context
)

# Session 2 (later, new conversation)
result2 = memory_agent.run_sync(
    "What do you remember about me?",
    deps=context
    # No message_history - this is a NEW conversation
)
print(result2.output)
# Output: I remember you are Pablo, you work in fintech in Dubai,
#         and you prefer dark mode.

Automatic Memory Extraction

Instead of relying on the agent to save memories, automatically extract facts after each conversation:

auto_extract.py

from pydantic import BaseModel
from pydantic_ai import Agent

class ExtractedFacts(BaseModel):
    facts: list[str]
    categories: list[str]

extractor = Agent(
    'openai:gpt-4o-mini',
    output_type=ExtractedFacts,
    instructions='''Extract important facts from the conversation.
    Focus on: names, preferences, jobs, locations, relationships.
    Return each fact separately with its category.'''
)

async def auto_extract_memories(conversation: str, memory_store: MemoryStore):
    """Run after each conversation to extract and store memories."""
    result = await extractor.run(f'Extract facts from:\n{conversation}')

    for fact, category in zip(result.output.facts, result.output.categories):
        memory_store.add(fact, category, importance=0.7)

Vector Database for Production

For production, use a vector database for semantic search. Popular options:

Pinecone (managed)
Weaviate (self-hosted or managed)
ChromaDB (local, good for development)
pgvector (PostgreSQL extension)

vector_memory.py

# Conceptual example - actual implementation depends on your vector DB
class VectorMemoryStore:
    def __init__(self, embedding_model, vector_db):
        self.embedding_model = embedding_model
        self.vector_db = vector_db

    def add(self, content: str, metadata: dict):
        embedding = self.embedding_model.embed(content)
        self.vector_db.insert(embedding, content, metadata)

    def search(self, query: str, limit: int = 5) -> list[dict]:
        query_embedding = self.embedding_model.embed(query)
        return self.vector_db.search(query_embedding, limit=limit)

Key Takeaways

1Memory is not the same as session. Sessions are one conversation. Memory spans all conversations.
2Store facts, not raw messages. Extract important information, not entire chat logs.
3Categories help organization. Personal, work, preferences, etc.
4Search by relevance. When recalling, find the most relevant memories for the current context.
5Use vector DBs in production. Semantic search finds relevant memories even with different wording.