New Method for Extending Context Length in Large Language Models

Researchers have identified an issue with large language models (LLMs) and their ability to handle long contexts due to their constrained window length. Although fine-tuning can extend the context window length, it comes at a significant cost in terms of training and inference time. This has a negative impact on the LLM’s core capabilities.

To address this problem, a team of researchers from the Beijing Academy of Artificial Intelligence, Gaoling School of Artificial Intelligence, and Renmin University of China has proposed a new method called Activation Beacon. This method aims to extend the context length of pre-trained LLMs without compromising their existing capabilities.

Activation Beacon works by condensing the raw activations of the LLM with minimal loss of information. This condensed form allows the LLM to grasp a broader context within a short window. It employs special tokens called beacons to achieve this condensing ratio. The beacons utilize three attention schemes, with stepwise expansion being the most effective. By combining condensed and raw activations in sliding windows, Activation Beacon predicts the next token efficiently, enabling the LLM to process long contextual information without sacrificing its ability to process shorter contexts.

Experimental results have shown that Activation Beacon outperforms existing methods for extending context length in LLMs. It achieves comparable or superior performance to fine-tuned full-attention methods, while maintaining higher efficiency. Activation Beacon has been tested on various tasks and showcases its effectiveness in diverse real-world applications.

Overall, Activation Beacon provides a low-cost and efficient solution for extending the context length of LLMs. This new method has the potential to greatly enhance the capabilities of large language models and enable them to handle longer contexts effectively. Further research and development in this area could lead to significant advancements in natural language processing and understanding.

The source of the article is from the blog portaldoriograndense.com