Memory¶

Memory allows agents to remember conversation history and context across multiple interactions.

What is Memory?¶

Without memory, each agent call is independent:

agent = build_agent('react', llm='openai:gpt-4o-mini', tools=[...])

agent.run("My name is Alice")
# Agent: "Nice to meet you, Alice!"

agent.run("What is my name?")
# Agent: "I don't know your name."  No memory

With memory, agents remember context:

from tinygent.memory import BufferChatMemory

agent = build_agent(
    'react',
    llm='openai:gpt-4o-mini',
    tools=[...],
    memory=BufferChatMemory()
)

agent.run("My name is Alice")
# Agent: "Nice to meet you, Alice!"

agent.run("What is my name?")
# Agent: "Your name is Alice!"  Remembers

Memory Types¶

Tinygent provides 4 built-in memory types:

1. BufferChatMemory¶

Best for: Short conversations, full history needed

Stores all messages in a list:

from tinygent.memory import BufferChatMemory

memory = BufferChatMemory()

agent = build_agent(
    'react',
    llm='openai:gpt-4o-mini',
    tools=[...],
    memory=memory
)

agent.run("Hello")
agent.run("My name is Bob")
agent.run("What's my name?")

# View history
print(memory.load_variables())
# [
#   HumanMessage("Hello"),
#   AIMessage("Hi there!"),
#   HumanMessage("My name is Bob"),
#   AIMessage("Nice to meet you, Bob!"),
#   HumanMessage("What's my name?"),
# ]

Pros:

Simple and reliable
Complete conversation history
No information loss

Cons:

Grows unbounded
Can exceed token limits
Expensive for long conversations

2. SummaryBufferMemory¶

Best for: Long conversations, summarization acceptable

Summarizes old messages to save tokens:

from tinygent.memory import SummaryBufferMemory

memory = SummaryBufferMemory(
    llm=build_llm('openai:gpt-4o-mini'),
    max_token_limit=500,  # Summarize when exceeded
)

agent = build_agent(
    'react',
    llm='openai:gpt-4o-mini',
    tools=[...],
    memory=memory
)

# After many messages, old ones get summarized
agent.run("Tell me about AI")  # 200 tokens
agent.run("What about ML?")    # 200 tokens
agent.run("And deep learning?") # 200 tokens
# Now at 600 tokens → triggers summary

# Old messages condensed to summary
print(memory.load_variables())
# [
#   SystemMessage("Summary: User asked about AI and ML..."),
#   HumanMessage("And deep learning?"),
#   AIMessage("Deep learning is..."),
# ]

Pros:

Handles long conversations
Prevents token limit issues
Maintains key information

Cons:

Loses details in summary
Extra LLM calls for summarization
May miss nuances

3. WindowBufferMemory¶

Best for: Recent context only, sliding window

Keeps only the last N messages:

from tinygent.memory import WindowBufferMemory

memory = WindowBufferMemory(window_size=4)  # Keep last 4 messages

agent = build_agent(
    'react',
    llm='openai:gpt-4o-mini',
    tools=[...],
    memory=memory
)

agent.run("Message 1")
agent.run("Message 2")
agent.run("Message 3")
agent.run("Message 4")

# Window is full: [User1, AI1, User2, AI2]

agent.run("Message 5")

# Oldest message dropped: [User2, AI2, User3, AI3]

Pros:

Predictable memory usage
Fast and simple
Good for recent context

Cons:

Forgets old information
No long-term memory
May lose important context

4. CombinedMemory¶

Best for: Multiple memory strategies simultaneously

Combine different memory types:

from tinygent.memory import CombinedMemory
from tinygent.memory import BufferChatMemory
from tinygent.memory import WindowBufferMemory

# Full history + recent window
combined = CombinedMemory(
    memories={
        'full_history': BufferChatMemory(),
        'recent': WindowBufferMemory(window_size=6),
    }
)

agent = build_agent(
    'react',
    llm='openai:gpt-4o-mini',
    tools=[...],
    memory=combined
)

# Both memories updated simultaneously
agent.run("Important information from the start")
# ... many messages ...
agent.run("Recent question")

# Access specific memory
full = combined.memories['full_history'].load_variables()
recent = combined.memories['recent'].load_variables()

Pros:

Flexible combinations
Multiple access patterns
Customizable strategies

Cons:

More complex setup
Higher memory usage

Memory Operations¶

Saving Context¶

Manually save messages:

from tinygent.core.datamodels.messages import TinyHumanMessage, TinyChatMessage

memory = BufferChatMemory()

# Save user message
user_msg = TinyHumanMessage(content="Hello")
memory.save_context(user_msg)

# Save AI response
ai_msg = TinyChatMessage(content="Hi there!")
memory.save_context(ai_msg)

Loading Variables¶

Retrieve conversation history:

# Get all messages
messages = memory.load_variables()

for msg in messages:
    print(f"{msg.role}: {msg.content}")
# human: Hello
# assistant: Hi there!

Clearing Memory¶

Reset conversation:

memory.clear()

# Memory is now empty
print(memory.load_variables())  # []

Message Types¶

Tinygent supports multiple message types:

from tinygent.core.datamodels.messages import (
    TinyHumanMessage,      # User messages
    TinyChatMessage,       # AI responses
    TinySystemMessage,     # System prompts
    TinyPlanMessage,       # Planning messages
    TinyToolMessage,       # Tool results
)

memory = BufferChatMemory()

memory.save_context(TinySystemMessage(content="You are a helpful assistant"))
memory.save_context(TinyHumanMessage(content="Hello"))
memory.save_context(TinyChatMessage(content="Hi there!"))
memory.save_context(TinyPlanMessage(content="Plan: 1. Greet user 2. Ask how to help"))

Memory Filtering¶

Filter messages by type:

from tinygent.core.datamodels.messages import TinyHumanMessage, TinyChatMessage

memory = BufferChatMemory()

# Add various messages
memory.save_context(TinyHumanMessage(content="User message 1"))
memory.save_context(TinyChatMessage(content="AI response 1"))
memory.save_context(TinyHumanMessage(content="User message 2"))
memory.save_context(TinyChatMessage(content="AI response 2"))

# Add filter: only human messages
memory._chat_history.add_filter(
    'only_human',
    lambda m: isinstance(m, TinyHumanMessage)
)

print(memory._chat_history)
# Only shows:
# - User message 1
# - User message 2

# Remove filter
memory._chat_history.remove_filter('only_human')

Advanced Patterns¶

Custom Memory¶

Create custom memory classes:

from tinygent.memory import BaseMemory

class KeywordMemory(BaseMemory):
    """Memory that only saves messages containing keywords."""

    def __init__(self, keywords: list[str]):
        super().__init__()
        self.keywords = keywords
        self.messages = []

    def save_context(self, message):
        # Only save if contains keyword
        if any(kw in message.content for kw in self.keywords):
            self.messages.append(message)

    def load_variables(self):
        return self.messages

    def clear(self):
        self.messages = []

# Use it
memory = KeywordMemory(keywords=['important', 'urgent', 'critical'])

agent = build_agent(
    'react',
    llm='openai:gpt-4o-mini',
    tools=[...],
    memory=memory
)

agent.run("This is important information")  # Saved
agent.run("Just casual chat")                # Not saved
agent.run("Urgent: respond ASAP")           # Saved

Persistent Memory¶

Save memory to disk:

import json
from pathlib import Path

def save_memory(memory, filepath: str):
    """Save memory to JSON file."""
    messages = [
        {'role': msg.role, 'content': msg.content}
        for msg in memory.load_variables()
    ]
    Path(filepath).write_text(json.dumps(messages, indent=2))

def load_memory(filepath: str) -> BufferChatMemory:
    """Load memory from JSON file."""
    memory = BufferChatMemory()
    messages = json.loads(Path(filepath).read_text())

    for msg in messages:
        if msg['role'] == 'human':
            memory.save_context(TinyHumanMessage(content=msg['content']))
        elif msg['role'] == 'assistant':
            memory.save_context(TinyChatMessage(content=msg['content']))

    return memory

# Usage
memory = BufferChatMemory()
agent = build_agent('react', llm='openai:gpt-4o-mini', memory=memory)

agent.run("Remember this")
save_memory(memory, 'conversation.json')

# Later...
memory = load_memory('conversation.json')
agent = build_agent('react', llm='openai:gpt-4o-mini', memory=memory)
agent.run("What did I say earlier?")  # Remembers from disk

Memory with MultiStep Agent¶

MultiStep agents benefit from memory:

from tinygent.agents.multi_step_agent import TinyMultiStepAgent
from tinygent.memory import BufferChatMemory

agent = TinyMultiStepAgent(
    llm=build_llm('openai:gpt-4o'),
    tools=[...],
    memory=BufferChatMemory(),
)

# First task
agent.run("Plan a trip to Prague")
# Agent creates plan, executes steps, remembers results

# Second task - can reference previous context
agent.run("Update the plan based on weather")
# Agent remembers previous plan and updates it

Choosing the Right Memory¶

Use Case	Memory Type	Why
Chatbot (short sessions)	BufferChatMemory	Full history, simple
Long conversations	SummaryBufferMemory	Prevents token overflow
Recent context only	WindowBufferMemory	Fast, bounded
Complex workflows	CombinedMemory	Multiple strategies
Debugging	BufferChatMemory	Full visibility
Production chatbot	SummaryBufferMemory	Scalable

Best Practices¶

1. Clear Memory When Needed¶

# Start fresh conversation
if user_says_reset:
    memory.clear()

2. Monitor Memory Size¶

messages = memory.load_variables()
if len(messages) > 50:
    print("Warning: Memory getting large")

3. Use Summaries for Long Chats¶

# For customer support (long sessions)
memory = SummaryBufferMemory(
    llm=build_llm('openai:gpt-4o-mini'),
    max_token_limit=1000,
)

4. Window for Short Context¶

# For quick Q&A (no long-term memory needed)
memory = WindowBufferMemory(window_size=4)

Memory and Middleware¶

Track memory changes with middleware:

from tinygent.agents.middleware import TinyBaseMiddleware

class MemoryMonitorMiddleware(TinyBaseMiddleware):
    def on_answer(self, *, run_id: str, answer: str) -> None:
        # Check memory size after each answer
        size = len(str(agent.memory.load_variables()))
        print(f"Memory size: {size} characters")

agent = build_agent(
    'react',
    llm='openai:gpt-4o-mini',
    tools=[...],
    memory=BufferChatMemory(),
    middleware=[MemoryMonitorMiddleware()]
)

Next Steps¶

Agents: Use memory with agents
Middleware: Monitor memory with middleware
Examples: See memory examples

Examples¶

Check out:

examples/memory/basic-chat-memory/main.py - Buffer memory
examples/memory/buffer-summary-memory/main.py - Summary memory
examples/memory/buffer-window-chat-memory/main.py - Window memory
examples/memory/combined-memory/main.py - Combined memory