Large Language Models (LLMs) are fundamentally stateless; they process each query independently. The ability to recall past conversations is not innate but created by sophisticated external memory systems.
Short-Term Memory: The Current Session
During a single chat, the application provides the LLM with a “context window”—a sliding buffer of the most recent conversation. This allows the model to maintain context, but as the dialogue grows, older exchanges are pushed out and forgotten.
Long-Term Memory: Cross-Session Recall
For remembering details across weeks or months, external systems are used:
- Vector Databases: Store conversations as searchable mathematical representations.
- Structured Storage: Keep key facts (names, preferences) in formats like JSON.
- Memory Systems: Organize and retrieve information efficiently.
How It Works in Practice
When you ask about a previous topic, the system:
- Searches its external databases for relevant information.
- Injects that retrieved context into your current query.
- The LLM processes this enhanced prompt to generate a response that appears informed by memory.
This hybrid approach creates the seamless illusion of a continuous relationship. The real magic is in the clever engineering that surrounds the model.
💡 Learn and Share
For more useful content, follow me.