Researchers Demonstrate New Technique for Stealing AI Models

December 12, 2024 Matt Shipman 5-min. read

NC State metal gateway sculpture silhouetted against the sky

For Immediate Release

Researchers have demonstrated the ability to steal an artificial intelligence (AI) model without hacking into the device where the model was running. The technique is novel in that it works even when the thief has no prior knowledge of the software or architecture that support the AI.

“AI models are valuable, we don’t want people to steal them,” says Aydin Aysu, co-author of a paper on the work and an associate professor of electrical and computer engineering at North Carolina State University. “Building a model is expensive and requires significant computing sources. But just as importantly, when a model is leaked, or stolen, the model also becomes more vulnerable to attacks – because third parties can study the model and identify any weaknesses.”

“As we note in the paper, model stealing attacks on AI and machine learning devices undermine intellectual property rights, compromise the competitive advantage of the model’s developers, and can expose sensitive data embedded in the model’s behavior,” says Ashley Kurian, first author of the paper and a Ph.D. student at NC State.

In this work, the researchers stole the hyperparameters of an AI model that was running on a Google Edge Tensor Processing Unit (TPU).

“In practical terms, that means we were able to determine the architecture and specific characteristics – known as layer details – we would need to make a copy of the AI model,” says Kurian.

“Because we stole the architecture and layer details, we were able to recreate the high-level features of the AI,” Aysu says. “We then used that information to recreate the functional AI model, or a very close surrogate of that model.”

The researchers used the Google Edge TPU for this demonstration because it is a commercially available chip that is widely used to run AI models on edge devices – meaning devices utilized by end users in the field, as opposed to AI systems that are used for database applications.

“This technique could be used to steal AI models running on many different devices,” Kurian says. “As long as the attacker knows the device they want to steal from, can access the device while it is running an AI model, and has access to another device with the same specifications, this technique should work.”

The technique used in this demonstration relies on monitoring electromagnetic signals. Specifically, the researchers placed an electromagnetic probe on top of a TPU chip. The probe provides real-time data on changes in the electromagnetic field of the TPU during AI processing.

“The electromagnetic data from the sensor essentially gives us a ‘signature’ of the AI processing behavior,” Kurian says. “That’s the easy part.”

To determine the AI model’s architecture and layer details, the researchers compare the electromagnetic signature of the model to a database of other AI model signatures made on an identical device – meaning another Google Edge TPU, in this case.

How can the researchers “steal” an AI model for which they don’t already have a signature? That’s where things get tricky.

The researchers have a technique that allows them to estimate the number of layers in the targeted AI model. Layers are a series of sequential operations that the AI model performs, with the result of each operation informing the following operation. Most AI models have 50 to 242 layers.

“Rather than trying to recreate a model’s entire electromagnetic signature, which would be computationally overwhelming, we break it down by layer,” Kurian says. “We already have a collection of 5,000 first-layer signatures from other AI models. So we compare the stolen first layer signature to the first layer signatures in our database to see which one matches most closely.

“Once we’ve reverse-engineered the first layer, that informs which 5,000 signatures we select to compare with the second layer,” Kurian says. “And this process continues until we’ve reverse-engineered all of the layers and have effectively made a copy of the AI model.”

In their demonstration, the researchers showed that this technique was able to recreate a stolen AI model with 99.91% accuracy.

“Now that we’ve defined and demonstrated this vulnerability, the next step is to develop and implement countermeasures to protect against it,” says Aysu.

The paper, “TPUXtract: An Exhaustive Hyperparameter Extraction Framework,” is published online by the Conference on Cryptographic Hardware and Embedded Systems. The paper was co-authored by Anuj Dubey, a former Ph.D. student at NC State, and Ferhat Yaman, a former graduate student at NC State. The work was done with support from the National Science Foundation, under grant number 1943245.

The researchers disclosed the vulnerability they identified to Google.

-shipman-

Note to Editors: The study abstract follows.

“TPUXtract: An Exhaustive Hyperparameter Extraction Framework”

Authors: Ashley Kurian, Anuj Dubey, Ferhat Yaman and Aydin Aysu, North Carolina State University

Published: Dec. 12, 2024, IACR Transactions on Cryptographic Hardware and Embedded Systems

DOI: 10.46586/tches.v2025.i1.78-103

Abstract: Model stealing attacks on AI/ML devices undermine intellectual property rights, compromise the competitive advantage of the original model developers, and potentially expose sensitive data embedded in the model’s behavior to unauthorized parties. While previous research works have demonstrated successful side-channel-based model recovery in embedded microcontrollers and FPGA-based accelerators, the exploration of attacks on commercial ML accelerators remains largely unexplored. Moreover, prior side-channel attacks fail when they encounter previously unknown models. This paper demonstrates the first successful model extraction attack on the Google Edge Tensor Processing Unit (TPU), an off-the-shelf ML accelerator. Specifically, we show a hyperparameter stealing attack that can extract all layer configurations including the layer type, number of nodes, kernel/filter sizes, number of filters, strides, padding, and activation function. Most notably, our attack is the first exhaustive attack that can extract previously unseen models. This is achieved through an online template-building approach instead of a pre-trained ML-based approach used in prior works. Our results on a black-box Google Edge TPU evaluation show that, through obtained electromagnetic traces, our proposed framework can achieve 99.91% accuracy, making it the most accurate one to date. Our findings indicate that attackers can successfully extract various types of models on a black-box commercial TPU with utmost detail and call for countermeasures.