Giving AI Memory: Conversation History and Context
The first version of the AI was amnesiac. "Add bread" worked. "Put it for tomorrow" failed — the AI didn't know which task was being referred to. It needed memory.
The context problem
LLMs don't have native memory. The full history must be sent back with each request. Two problems: the context window and token cost.
The context window
Sliding window with priorities: the system prompt is always included, the last 2 exchanges are always included, function calls are prioritized. Older exchanges are summarized or deleted depending on the token budget.
Impact on costs
A 10-exchange conversation can cost 5x more than an isolated exchange. Limit of 20 exchanges per conversation, estimated token counter, model choice based on complexity.
The OpenRouter fallback
If the main model returns 429 or 503, the backend retries with OPENROUTER_FALLBACK_MODEL. Transparent to the user. An AlertService sends an email to the admin. In production, 2-3 fallbacks per week.
What memory changes
"Modify the priority of the task we just created", "Actually, put that on Friday". The AI understands pronouns, references, corrections. The difference between a tool and an assistant.