stream() / generate() call gets the
messages the client sends, and nothing else. Because nothing is shared between
calls, you can safely reuse one agent instance across concurrent runs. Fanning
out per-item reviews or classifications over a shared instance keeps every run
isolated. Configure memory on the agent to persist history across calls, and
use createAgUiHandler to stream the response back.
Memory configuration is independent of model selection, so these examples omit
model and use openai/gpt-5.4-nano.
Prerequisites
- An agent in
agents/(see Agents). - An AG-UI route (see API routes for the
createAgUiHandler("assistant")pattern). - A storage backend if you choose
conversationmemory; the default in-memory driver is fine while developing.
Choose a memory mode
Configure memory on your agent to persist messages across requests. A configured agent accumulates one shared conversation on the instance, so reuse it sequentially (a single chat thread) rather than across concurrent independent runs. For per-item fan-out, create a fresh agent per run instead. To keep the stateless default explicitly (for a single-shot agent that should never persist history), setenabled: false:
Buffer memory
Keeps the last N messages. Simple and predictable:Conversation memory
Sliding window based on token count. Drops the oldest messages when the limit is reached:Summary memory
Automatically summarizes older messages to fit more context into fewer tokens:Redis memory
For production deployments where multiple server instances share state:Memory operations
Access memory programmatically in API routes:getMemoryStats() returns:
Streaming
Server-side streaming
UsecreateAgUiHandler() for chat UI routes. It validates the request, invokes
the agent, and returns AG-UI SSE:
agent.stream() directly only when you are building a custom transport or
non-chat streaming surface.
Client-side consumption
TheuseChat hook handles the streaming protocol automatically:
Non-streaming generation
Usegenerate() when you need the complete response at once:
Verify it worked
Send two messages on the samethreadId (with conversation memory) and
confirm the second response references the first message. With curl: