Skip to main content
News Releases

Researchers Found a Better Way to Teach Large Language Models New Skills

illustration shows a line drawing of circuits arranged in the shape of a brain
Image credit: Growtika.

For Immediate Release

Researchers have developed a technique that significantly improves the performance of large language models without increasing the computational power necessary to fine-tune the models. The researchers demonstrated that their technique improves the performance of these models over previous techniques in tasks including commonsense reasoning, arithmetic reasoning, instruction following, code generation, and visual recognition.

Large language models are artificial intelligence systems that are pretrained on huge data sets. After pretraining, these models predict which words should follow each other in order to respond to user queries. However, the nonspecific nature of pretraining means that there is ample room for improvement with these models when the user queries are focused on specific topics, such as when a user requests the model to answer a math question or to write computer code.

“In order to improve a model’s ability to perform more specific tasks, you need to fine-tune the model,” says Tianfu Wu, co-corresponding author of a paper on the work and an associate professor of computer engineering at North Carolina State University. “However, these models are so large that it is not feasible to re-train the entire model. Instead, you want to determine the smallest number of changes necessary to improve the model’s performance. We’ve developed a technique, called WeGeFT (pronounced wee-gift), that represents a significant advance for fine-tuning these large models.”

The big break-through for fine-tuning these large models was called LoRA, which came out in 2022. LoRA works by using mathematical tools to identify a small subset of key parameters that are most likely to improve a model’s performance on a specific task. There have been many attempts to improve upon LoRA, but Wu and his collaborators found these previous efforts either required significantly more computational power to improve performance, or used the same amount of computing power without improving performance.

“WeGeFT builds on LoRA, but incorporates additional mathematical tools that allow us to determine which of the key parameters the model is already familiar with and which parameters the model would need to ‘learn,’” says Wu. “By placing more weight on the truly novel parameters, we are able to improve model performance compared to LoRA without incorporating significant new computational demands.”

In proof-of-concept testing, the researchers found that WeGeFT performed as well as or better than LoRA and its many variants across a variety of downstream tasks: commonsense reasoning, arithmetic reasoning, instruction following, code generation, and visual recognition.

“We think this is a valuable step forward,” Wu says. “We are now exploring ways that WeGeFT could also be used to identify elements of the model that are responsible for harmful outputs, with the goal of improving AI alignment and ‘surgery’ to improve model safety and outputs. We expect that work to be forthcoming.”

The paper, “WeGeFT: Weight-Generative Fine-Tuning for Multi-Faceted Efficient Adaptation of Large Models,” will be presented July 17 at the International Conference on Machine Learning, being held in Vancouver, Canada. Co-corresponding author of the paper is Chinmay Savadikar, a Ph.D. student at NC State. The paper was co-authored by Xi Song, an independent researcher.

This work was done with support from the National Science Foundation under grants 1909644, 2024688 and 2013451; and from the Army Research Office under grants W911NF1810295 and W911NF2210010.

-shipman-

Note to Editors: The study abstract follows.

“WeGeFT: Weight-Generative Fine-Tuning for Multi-Faceted Efficient Adaptation of Large Models”

Authors: Chinmay Savadikar and Tianfu Wu, North Carolina State University; Xi Song, independent researcher

Presented: July 13-19, International Conference on Machine Learning, Vancouver, Canada

Abstract: Fine-tuning large pretrained Transformer models can focus on either introducing a small number of new learnable parameters (parameter efficiency) or editing representations of a small number of tokens using lightweight modules (representation efficiency). While the pioneering method LoRA (Low-Rank Adaptation) inherently balances parameter, compute, and memory efficiency, many subsequent variants trade off compute and memory efficiency and/or performance to further reduce fine-tuning parameters. To address this limitation and unify parameter-efficient and representation-efficient fine-tuning, we propose Weight-Generative Fine-Tuning (WeGeFT, pronounced wee-gift), a novel approach that learns to generate fine-tuning weights directly from the pretrained weights. WeGeFT employs a simple low-rank formulation consisting of two linear layers, either shared across multiple layers of the pretrained model or individually learned for different layers. This design achieves multi-faceted efficiency in parameters, representations, compute, and memory, while maintaining or exceeding the performance of LoRA and its variants. Extensive experiments on commonsense reasoning, arithmetic reasoning, instruction following, code generation, and visual recognition verify the effectiveness of our proposed WeGeFT.