New Training Techniques Unleash the Power of Supercomputers

Summary: Researchers at the Oak Ridge National Laboratory have harnessed the power of the world’s most powerful supercomputer, Frontier, to train a large language model (LLM) with one trillion parameters. By utilizing a combination of tensor parallelism, pipeline parallelism, and data parallelism, they were able to achieve faster training times and peak throughputs for models of different sizes. However, the researchers have yet to disclose the specific timescales for training the LLM.

Supercomputers are not typically used for training LLMs, as they require more specialized servers and a significantly higher number of GPUs. In comparison, ChatGPT, a well-known language model, was trained on over 20,000 GPUs. However, the researchers at Oak Ridge National Laboratory wanted to explore the potential of training LLMs on supercomputers and determine if they could improve efficiency.

One of the challenges they faced was the limited VRAM of each individual GPU. To overcome this, the researchers grouped multiple GPUs together and optimized parallel communication between the components. This allowed for better utilization of resources as the size of the LLM increased.

With their new approach, the researchers achieved impressive results. For the different parameter-scale models they trained, they achieved peak throughputs ranging from 31.96% to 38.38%. They also demonstrated 100% weak scaling efficiency and strong scaling performances of 87.05% to 89.93% for the larger models.

While the researchers openly shared information about the computing resources used and the techniques employed, they have not yet provided specific details on the training timescales. This leaves a lingering question about how much faster training LLMs on supercomputers can be compared to traditional methods.

This research opens up new possibilities for training large language models more efficiently. The combination of specialized techniques and the immense computational power of supercomputers offer a promising avenue for further advancements in natural language processing and AI. As more researchers explore these new training techniques, we can expect to see even more impressive language models in the future.

The source of the article is from the blog anexartiti.gr