Large Language Models (LLM) with Long-Term Memory: Advancements and Opportunities in GenAI Applications

9 min readJul 7, 2023

Introduction

Large language models (LLMs) are advanced machine learning models with many parameters trained on a massive library of text. From a technical standpoint, language modeling (LM) is a crucial technique for generating advances in machine language intelligence. The primary objective of LM is to develop models that accurately estimate the probability of generating word sequences, thereby facilitating the estimation of probabilities associated with future tokens or those that are absent.

Recent developments in natural language processing and related disciplines have been facilitated by large language models (LLMs) such as ChatGPT. Research has demonstrated that when these models reach a certain scale, they exhibit emergent behaviors, such as the capacity to engage in “reasoning”. By incorporating models with “chain-of-thought” methods, i.e. reasoning examples or a straightforward prompt such as “Let’s think step by step,” these models can respond to questions with explicit logical steps.

One example we can look at that LLMs can handle is: “All whales are mammals, and all mammals have kidneys; therefore, all whales have kidneys.” This particular example has generated significant excitement within the research community. The reason behind this enthusiasm is that the ability to reason logically is considered a fundamental characteristic of human intelligence and is often seen as lacking in current artificial intelligence systems.

While large language models (LLMs) have demonstrated remarkable performance on specific reasoning tasks, their generalizable reasoning capabilities and the extent to which they can engage in reasoning processes remain debatable. This uncertainty has triggered ongoing debates and different perspectives among the research community. For instance, some say that LLMs are “adequate zero-shot reasoners”, whereas other authors have concluded that “LLMs are still a long way from achieving adequate results on everyday planning/reasoning tasks that pose no challenge for humans.”

In this article, we begin to explore the role of memory in advancing LLMs, some emerging architectures, and opportunities for their application. Subsequent articles will cover more of the implementation details and specifics of different approaches and design patterns.

LLMs with Long Term Memory

LLMs are founded on an architecture for deep learning called the Transformer Model. The Transformer Model employs self-attention mechanisms to determine the relationships between words or elements in a text sequence. This and related foundational developments enable the model to comprehend and generate text with greater context and coherence.

To improve the capabilities of LLMs, researchers have considered incorporating mechanisms for long-term memory into these models. Long-term memory allows a model to store information for an extended period and utilize it when developing responses. There are currently two primary ways in which long-term memory can be incorporated.

One way that LLMs can incorporate long-term memory is by utilizing additional memory components, such as third-party databases including vector databases or (knowledge) graph databases. These components store factual data or embeddings that the model can access and use to generate text. The LLM can provide more precise and accurate responses by accessing this external memory.

Alternatively, memory can be incorporated into the model itself. For instance, the model may include an internal memory module (e.g., an index, a modified attention mechanism, etc.) that stores pertinent data during text processing. This internal memory can be modified and accessed to affect the model’s behavior. LLMs can demonstrate more consistent behavior and create appropriate responses for the given context by employing long-term memory in the actual model architecture.

Architectural Design Considerations:

Several architectural factors must be accounted for when developing LLMs with long-term memory. A few crucial considerations include:

Storage: The model’s memory should be large enough to store and retrieve pertinent information. The memory component’s size and architecture must be designed to accommodate the intended quantity of long-term data.
Storage update: The model should include mechanisms for revising long-term memory in response to new input or data. This makes sure the memory is up-to-date and pertinent to the current context.
Memory retrieval: The model requires efficient access and retrieval mechanisms for information stored in long-term memory. This retrieval process must be quick and precise so that the model can utilize the stored information during text generation without causing excess latency. This is especially true of methods utilizing external memory components, such as a third-party database.
Incorporation with attention mechanisms: The model’s long-term memory should be incorporated with self-attention mechanisms. This enables the model to pay attention to pertinent portions of memory during text generation and make informed decisions based on the stored data.
Training and Selection: Training LLMs with long-term memory demands a thorough understanding of how the memory is updated and utilized during training. To ensure efficient learning and utilization of stored data, robust selection techniques must account for the memory component.

To that end, there are some new emerging architectures in the research community that try to account for these design considerations, while also improving cost and long-memory efficiency. One example, as seen in the following diagram, is adding a decoupled caching layer that stays fresh through training to supplement pre-trained models.

Example architecture with augmented memory mechanism from Wang et al. (2023)

Another example shown below is the “Hyena Hierarchy,” which looks at breaking the quadratic barrier (causes scaling cost and compute time) of the attention mechanism by replacing the attention mechanism with subquadratic operators in a new dense, attention-free architecture.

Example architecture using Hyena Hierarchy from Poli et al. (2023)

These new paradigms and others coming out of the research and developer communities will be explored in further detail in future articles.

Advancements and Opportunities in GenAI Applications

The development of LLM-based GenAI applications has created new opportunities in numerous fields. As models become more adept at comprehending and remembering extended context, they can be utilized in increasingly complex and dynamic situations.

Advancements in LLMs

Improving Context Understanding and Coherence in Text Generation

The capability of GenAI applications to comprehend and generate text with enhanced context comprehension and coherence is one of their significant benefits. Modern LLMs, such as GPT-3.5 and above, have been trained on enormous quantities of data and may produce more appropriate and contextually appropriate responses. This enhancement enables more meaningful and precise interactions across multiple applications. This is furthered by the improvement of long-term memory mechanisms in areas such as those listed below.

Improved Accuracy and Specificity of Generated Responses:

The ability to fine-tune LLMs to acquire domain-specific expertise is one of the main benefits of GenAI applications. This goes beyond using only “preamble” or prompt engineering of an existing general LLM. By introducing the model to an enormous amount of data from a specific domain and tuning its parameters accordingly, we can train it to generate responses demonstrating a profound understanding of the topic. This specialization permits the model to provide more precise and contextually pertinent data.

On top of this, GenAI applications excel at information retrieval by employing these refined LLMs. By integrating mechanisms for long-term memory, these models may retrieve information with even better precision and relevance to the collected data, allowing the model to provide more accurate results to user queries.

Consistent Behavior and Reduction of Biases in LLMs:

Ensuring the consistency of, reducing hallucinations in, and eliminating biases within LLMs present enormous challenges. By utilizing better long-term memory mechanisms in LLMs, we can potentially improve bias in the following areas:

Bias Reduction: AI models acquire knowledge from data. The model will exhibit the same biases if the training data contains biases. Therefore, it is essential to curate a balanced and diverse training dataset. This requires precise data selection and evaluation to reduce implicit and explicit bias. Better long-term memory can improve the quality of outputs, reducing hallucinations and potentially biased outputs.
Fairness Evaluation: Implementing and monitoring impartiality metrics can assist in quantifying bias. These metrics may serve as guidelines for the LLM’s behavior. Part of this also requires looking at the input (prompts) and outputs (responses) of the LLM at the user interaction level. This is where having an external long-term memory mechanism can be helpful as well. Conducting regular audits and evaluations throughout the entire process is essential to identify and correct any biases in the AI system.
Algorithmic Transparency: Most LLMs are still very opaque in their decision-making. While there are explainable AI techniques that can help identify inherent biases by revealing how the model makes its decisions, there are also opportunities with emerging memory mechanisms within model architectures that could make models more transparent. For example, techniques that can freeze or update only certain parts of a model or its running long-term memory could keep models up-to-date but also make sure they adhere to regulations such as the “right to be forgotten” without requiring retraining.

Opportunities:

Generative AI is a subfield of artificial intelligence that concentrates on developing models and systems to generate new and unique content. In contrast to traditional AI systems, intended to identify and categorize existing data, generative AI seeks to generate new data that mimics the trends and features of the training data to which it has been exposed. Having more advanced memory in LLMs powering generative AI applications can significantly impact the following areas:

Content Creation: AI has significantly advanced in generating written content, including poetry, news articles, and technical papers. For instance, businesses can use AI to generate marketing copy or provide content suggestions, drastically reducing the time spent creating and modifying content. In addition, Generative AI models use methods such as generative adversarial networks (GANs), variational autoencoders (VAEs), and RNN’s to generate new content such as pictures, videos, text, and music. These models discover the fundamental patterns and frameworks of the training data and then use this information to produce new samples with comparable features. Having longer-term memory can help evolve content based on a longer context and reference back key elements from previous statements or works.
Personalized Experiences: Whether it be through the generation of personalized purchasing experiences or the creation of unique learning paths in education, Generative AI has the potential to personalize numerous aspects of our everyday lives. Improved memory would provide better personalization capabilities that could account for a longer period of time of preferences.
Drug Discovery: Generative AI is already used to forecast the potential properties of molecules and generate new potential drug candidates in drug discovery. This already has proven to significantly accelerate the drug discovery and development process. Better long-term memory could increase the potential of these applications even more by incorporating more context about each molecule or drug candidate.

Conclusion

Despite the exciting developments and opportunities, substantial challenges lie ahead. It is computationally challenging to train these advanced LLMs, and their complexity makes them more difficult to comprehend and control. Researchers are still discovering how to manage these trade-offs effectively. A key part of this discussion is the role of long-term memory in LLMs and GenAI more broadly.

Concerns regarding the ethical implications of such advanced AI models continue to grow. There are still unresolved privacy concerns, particularly when considering models that can remember information for extended periods. Also of critical concern is whether these models will remember and perpetuate detrimental or biased behavior.

Researchers, developers, regulators, and end-users must collaborate to address these challenges. In order to responsibly uncover every potential of LLM-based GenAI applications, it is necessary to develop novel methods for model comprehension, ethical standards for AI usage, and more effective privacy protection methods.

In conclusion, the rapid development of GenAI applications based on developments in memory and context represents a significant milestone in the evolution of AI. As we continue to stretch the limits of what AI can recall, reason over, and recognize, the distinction between human and artificial intelligence tasks becomes increasingly fuzzy. The developments in LLMs offer intriguing opportunities in various industries but also present new challenges. As we continue to evolve artificial intelligence, the road ahead is stimulating and full of potential.

References

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Link.

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in neural information processing systems, 35, 22199–22213. Link.

Marcus, G. 2020. The next decade in ai: four steps towards robust artificial intelligence. ArXiv preprint, abs/2002.06177. Link.

Paananen, V., Oppenlaender, J., & Visuri, A. (2023). Using Text-to-Image Generation for Architectural Design Ideation. Link.

Terrence J. Sejnowski; Large Language Models and the Reverse Turing Test. Neural Comput 2023; 35 (3): 309–342. doi: https://doi.org/10.1162/neco_a_01563

Wang, W., Dong, L., Cheng, H., Liu, X., Yan, X., Gao, J., & Wei, F. (2023). Augmenting Language Models with Long-Term Memory. Link.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., … & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. Link.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. Link.

Yosef, M. A., Bauer, S., Hoffart, J., Spaniol, M., & Weikum, G. (2012, December). Hyena: Hierarchical type classification for entity names. In Proceedings of COLING 2012: Posters (pp. 1361–1370). Link.