New Method Forecasts Computation, Energy Costs for Sustainable AI Models

January 13, 2025 Matt Shipman 5-min. read

a circuitboard features a large black chip with the letters AI written on it — Image credit: Igor Omilaev.

For Immediate Release

The process of updating deep learning/AI models when they face new tasks or must accommodate changes in data can have significant costs in terms of computational resources and energy consumption. Researchers have developed a novel method that predicts those costs, allowing users to make informed decisions about when to update AI models to improve AI sustainability.

“There have been studies that focused on making deep learning model training more efficient,” says Jung-Eun Kim, corresponding author of a paper on the work and an assistant professor of computer science at North Carolina State University. “However, over a model’s life cycle, it will likely need to be updated many times. One reason is that, as our work here shows, retraining an existing model is much more cost effective than training a new model from scratch.

“If we want to address sustainability issues related to deep learning AI, we must look at computational and energy costs across a model’s entire life cycle – including the costs associated with updates. If you cannot predict what the costs will be ahead of time, it is impossible to engage in the type of planning that makes sustainability efforts possible. That makes our work here particularly valuable.”

Training a deep learning model is a computationally intensive process, and users want to go as long as possible without having to update the AI. However, two types of shifts can happen that make these updates inevitable. First, the task that the AI is performing may need to be modified. For example, if a model was initially tasked with only classifying digits and traffic symbols, you may need to modify the task to identify vehicles and humans as well. This is called a task shift.

Second, the data users provide to the model may change. For example, you may need to make use of a new kind of data, or perhaps the data you are working with is being coded in a different way. Either way, the AI needs to be updated to accommodate the change. This is called a distribution shift.

“Regardless of what is driving the need for an update, it is extremely useful for AI practitioners to have a realistic estimate of the computational demand that will be required for the update,” Kim says. “This can help them make informed decisions about when to conduct the update, as well as how much computational demand they will need to budget for the update.”

To forecast what the computational and energy costs will be, the researchers developed a new technique they call the REpresentation Shift QUantifying Estimator (RESQUE).

Essentially, RESQUE allows users to compare the dataset that a deep learning model was initially trained on to the new dataset that will be used to update the model. This comparison is done in a way that estimates the computational and energy costs associated with conducting the update.

Those costs are presented as a single index value, which can then be compared with five metrics: epochs, parameter change, gradient norm, carbon and energy. Epochs, parameter change and gradient norm are all ways of measuring the amount of computational effort necessary to retrain the model.

“However, to provide insight regarding what this means in a broader sustainability context, we also tell users how much energy, in kilowatt hours, will be needed to retrain the model,” Kim says. “And we predict how much carbon, in kilograms, will be released into the atmosphere in order to provide that energy.”

The researchers conducted extensive experiments involving multiple data sets, many different distribution shifts, and many different task shifts to validate RESQUE’s performance.

“We found that the RESQUE predictions aligned very closely with the real-world costs of conducting deep learning model updates,” Kim says. “Also, as I noted earlier, all of our experimental findings tell us that training a new model from scratch demands far more computational power and energy than retraining an existing model.”

In the short term, RESQUE is a useful methodology for anyone who needs to update a deep learning model.

“RESQUE can be used to help users budget computational resources for updates, allow them to predict how long the update will take, and so on,” Kim says.

“In the bigger picture, this work offers a deeper understanding of the costs associated with deep learning models across their entire life cycle, which can help us make informed decisions related to the sustainability of the models and how they are used. Because if we want AI to be viable and useful, these models must be not only dynamic but sustainable.”

The paper, “RESQUE: Quantifying Estimator to Task and Distribution Shift for Sustainable Model Reusability,” will be presented at the Thirty-Ninth Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, which will be held Feb. 25-Mar. 4 in Philadelphia, Penn. The first author of the paper is Vishwesh Sangarya, a graduate student at NC State.

-shipman-

Note to Editors: The study abstract follows.

“RESQUE: Quantifying Estimator to Task and Distribution Shift for Sustainable Model Reusability”

Authors: Vishwesh Sangarya and Jung-Eun Kim, North Carolina State University

Presented: Thirty-Ninth Association for the AAAI Conference on Artificial Intelligence, Feb. 25-Mar. 4 in Philadelphia, Penn.

Abstract: As a strategy for sustainability of deep learning, reusing an existing model by retraining it rather than training a new model from scratch is critical. In this paper, we propose Representation Shift QUantifying Estimator (RESQUE), a predictive quantifier to estimate the retraining cost of a model to distributional shifts or change of tasks. It provides a single concise index for an estimate of resources required for retraining the model. Through extensive experiments, we show that RESQUE has a strong correlation with various retraining measures. Our results validate that RESQUE is an effective indicator in terms of epochs, gradient norms, changes of parameter magnitude, energy, and carbon emissions. These measures align well with RESQUE for new tasks, multiple noise types, and varying noise intensities. As a result, RESQUE enables users to make informed decisions for retraining to different tasks/distribution shifts and determine the most cost-effective and sustainable option, allowing for the reuse of a model with a much smaller footprint in the environment.