Blog
AI/Voice
January 25, 20266 min

Giving AI Memory: Conversation History and Context

The first version of the AI was amnesiac. "Add bread" worked. "Put it for tomorrow" failed — the AI didn't know which task was being referred to. It needed memory.

The context problem

LLMs don't have native memory. The full history must be sent back with each request. Two problems: the context window and token cost.

The context window

Sliding window with priorities: the system prompt is always included, the last 2 exchanges are always included, function calls are prioritized. Older exchanges are summarized or deleted depending on the token budget.

Impact on costs

A 10-exchange conversation can cost 5x more than an isolated exchange. Limit of 20 exchanges per conversation, estimated token counter, model choice based on complexity.

The OpenRouter fallback

If the main model returns 429 or 503, the backend retries with OPENROUTER_FALLBACK_MODEL. Transparent to the user. An AlertService sends an email to the admin. In production, 2-3 fallbacks per week.

What memory changes

"Modify the priority of the task we just created", "Actually, put that on Friday". The AI understands pronouns, references, corrections. The difference between a tool and an assistant.