Computer Engineers Boost App Speeds by More Than 9 Percent
For Immediate Release
Researchers from North Carolina State University and Samsung Electronics have found a way to boost the speed of computer applications by more than 9 percent. The improvement results from techniques that allow computer processors to retrieve data more efficiently.
Computer processors have to retrieve data from memory to perform operations. All data is stored in off-chip “main” memory. But data that the processor will use a lot is also stored – temporarily – in a die-stacked dynamic random access memory (DRAM) cache that is located closer to the processor, where it can be retrieved more quickly.
The data in the cache is organized into large blocks, or macroblocks, so that the processor knows where to find whatever data it needs. However, for any given operation, the processor doesn’t need all of the data in a macroblock – and retrieving the unnecessary data takes time and energy.
To make the process more efficient, researchers have developed a technique in which the cache learns over time which data the processor needs from each macroblock. This allows the cache to do two things. First, the cache can compress the macroblock, retrieving only the relevant data. This enables the cache to send data to the processor more efficiently. Second, because the macroblock is compressed, this frees up space in the cache that can be used to store other data, which the processor is more likely to need.
The researchers tested this approach, called Dense Footprint Cache, in a processor and memory simulator. After running 3 billion instructions for each application tested through the simulator, the researchers found that the Dense Footprint Cache sped up applications by 9.5 percent compared to state-of-the-art competing methods for managing die-stacked DRAM. Dense Footprint Cache also used 4.3 percent less energy.
The researchers also found that Dense Footprint Cache led to a significant improvement in “last-level cache miss ratios.” Last-level cache misses occur when the processor tries to retrieve data from the cache, but the data aren’t there, forcing the processor to retrieve the data from off-chip main memory. These cache misses make operations much less efficient – and Dense Footprint Cache reduced last-level cache miss ratios by 43 percent.
The work is featured in a paper, “Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache,” that will be presented at the International Symposium on Memory Systems, Oct. 3-6 in Washington, D.C.
Lead author of the paper is Seunghee Shin, a Ph.D. student at NC State. The paper was co-authored by Yan Solihin, a professor of electrical and computer engineering at NC State, and Sihong Kim of Samsung Electronics.
Note to Editors: The study abstract follows.
“Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache”
Authors: Seunghee Shin and Yan Solihin, North Carolina State University; Sihong Kim, Samsung Electronics
Presented: International Symposium on Memory Systems, Oct. 3-6, Washington, D.C.
Abstract: Die-stacked DRAM technology enables a large Last Level Cache (LLC) that provides high bandwidth data access to the processor. However, it requires a large tag array that may take a significant portion of the on-chip SRAM budget. To reduce this SRAM overhead, systems like Intel Haswell relies on a large block (Mblock) size. One drawback of a large Mblock size is that many bytes of an Mblock are not needed by the processor but are fetched into the cache. A recent technique (Footprint cache) to solve this problem works by dividing the Mblock into smaller blocks where only blocks predicted to be needed by the processor are brought into the LLC. While it helps to alleviate the excessive bandwidth consumption from fetching unneeded blocks, the capacity waste remains: only blocks that are predicted useful are fetched and allocated, and the remaining area of the Mblock is left empty, creating holes. Unfortunately, holes create significant capacity overheads which could have been used for useful data, hence wasted refresh power on useless data. In this paper, we propose a new design, Dense Footprint Cache (DFC). Similar to Footprint cache, DFC uses a large Mblock and relies on useful block prediction in order to reduce memory bandwidth consumption. However, when blocks of an Mblock are fetched, the blocks are placed contiguously in the cache, thereby eliminating holes, increasing capacity and power efficiency, and increasing performance. Mblocks in DFC have variable sizes and a cache set has a variable associativity, hence it presents new challenges in designing its management policies (placement, replacement, and update). Through simulation of Big Data applications, we show that DFC reduces LLC miss ratios by about 43%, speeds up applications by 9.5%, while consuming 4.3% less energy on average.