Computer engineers at North Carolina State University have developed hardware that allows programs to operate more efficiently by significantly boosting the speed at which the “cores” on a computer chip communicate with each other.
The core, or central processing unit, is the brain of a computer chip; most chips currently contain between four and eight cores. In order to perform a task more quickly using multiple cores on a single chip, those cores need to communicate with each other. But there are no direct ways for cores to communicate. Instead, one core sends data to memory and another core retrieves it using software algorithms.
“Our technology is more efficient because it provides a single instruction to send data to another core, which is six times faster than the best state-of-the-art software we could find,” says Dr. James Tuck, an assistant professor of electrical and computer engineering at NC State and co-author of a paper describing the research. Tuck explains that the technology, called HAQu, is “not hardware designed to communicate data on its own, but is hardware that expedites data-sharing using existing data paths on a computer chip.” Because HAQu uses these existing data paths, the research team compared it to software communication tools – even though it is a piece of hardware.
HAQu is also more energy efficient. “It actually consumes more power when operating but, because it runs so much more quickly, the overall energy consumption of the chip actually decreases,” Tuck says.
The next step for the research team is to incorporate the hardware into a prototype system to demonstrate its utility in a complex software environment.
The paper, “HAQu: Hardware-Accelerated Queueing for Fine-Grained Threading on a Chip Multiprocessor,” is co-authored by Tuck, NC State Ph.D. students Sanghoon Lee and Devesh Tiwari, and Dr. Yan Solihin, an associate professor of electrical and computer engineering at NC State. The paper will be presented Feb. 14 at the International Symposium on High-Performance Computer Architecture in San Antonio, Texas. The research was funded, in part, by the National Science Foundation.
Note to Editors: The study abstract follows.
“HAQu: Hardware-Accelerated Queueing for Fine-Grained Threading on a Chip Multiprocessor”
Authors: Sanghoon Lee, Devesh Tiwari, Yan Solihin, James Tuck, North Carolina State University
Presented: Feb. 14, 2011, at the International Symposium on High-Performance Computer Architecture in San Antonio, Texas
Abstract: Queues are commonly used in multithreaded programs for synchronization and communication. However, because software queues tend to be too expensive to support finegrained parallelism, hardware queues have been proposed to reduce overhead of communication between cores. Hardware queues require modifications to the processor core and need a custom interconnect. They also pose difficulties for the operating system because their state must be preserved across context switches. To solve these problems, we propose a hardware-accelerated queue, or HAQu. HAQu adds hardware to a CMP that accelerates operations on software queues. Our design implements fast queueing through an application’s address space with operations that are compatible with a fully software queue. Our design provides accelerated and OS-transparent performance in three general ways: (1) it provides a single instruction for enqueueing and dequeueing which significantly reduces the overhead when used in fine-grained threading; (2) operations on the queue are designed to leverage low-level details of the coherence protocol ; and (3) hardware ensures that the full state of the queue is stored in the application’s address space, thereby ensuring virtualization. We have evaluated our design in the context of application domains: offloading fine-grained checks for improved software reliability, and automatic, fine-grained parallelization using decoupled software pipelining.