Large Language Models (LLMs) are fundamentally stateless; they process each query independently. The ability to recall past conversations is not innate but created by sophisticated external memory systems.

Short-Term Memory: The Current Session

During a single chat, the application provides the LLM with a “context window”—a sliding buffer of the most recent conversation. This allows the model to maintain context, but as the dialogue grows, older exchanges are pushed out and forgotten.

Long-Term Memory: Cross-Session Recall

For remembering details across weeks or months, external systems are used:

  • Vector Databases: Store conversations as searchable mathematical representations.
  • Structured Storage: Keep key facts (names, preferences) in formats like JSON.
  • Memory Systems: Organize and retrieve information efficiently.

How It Works in Practice

When you ask about a previous topic, the system:

  1. Searches its external databases for relevant information.
  2. Injects that retrieved context into your current query.
  3. The LLM processes this enhanced prompt to generate a response that appears informed by memory.

This hybrid approach creates the seamless illusion of a continuous relationship. The real magic is in the clever engineering that surrounds the model.

💡 Learn and Share

For more useful content, follow me.

haricodehunter@gmail.com

haricodehunter@gmail.com

DevSecOps Engineer, AI/ML enthusiast, and technology blogger.