SEARCH
ISTC-CC NEWSLETTER
RESEARCH HIGHLIGHTS
Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
ISTC-CC provides a listing of useful benchmarks for cloud computing.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
ISTC-CC Abstract
The Dirty-Block Index
Proceedings of the 41st International Symposium on Computer Architecture (ISCA'14), June 2014.
Vivek Seshadri, Abhishek Bhowmick, Onur Mutlu, Phillip B. Gibbons*,
Michael A. Kozuch*, Todd C. Mowry
Carnegie Mellon University
*Intel Labs
On-chip caches maintain multiple pieces of metadata about each cached block—e.g., dirty bit, coherence information, ECC. Traditionally, such metadata for each block is stored in the corresponding tag entry in the tag store. While this approach is simple to implement and scalable, it necessitates a full tag store lookup for any metadata query—resulting in high latency and energy consumption. We Vnd that this approach is ineX- cient and inhibits several cache optimizations. In this work, we propose a new way of organizing the dirty bit information that enables simpler and more eXcient implementations of several optimizations. In our proposed approach, we remove the dirty bits from the tag store and organize it diUerently in a separate structure, which we call the Dirty-Block Index (DBI). The organization of DBI is simple: it consists of multiple entries, each corresponding to some row in DRAM. A bit vector in each entry tracks whether or not each block in the corresponding DRAM row is dirty. We demonstrate the beneVts of DBI by using it to simultaneously and eXciently implement three optimizations proposed by prior work: 1) Aggressive DRAM-aware writeback, 2) Bypassing cache lookups, and 3) Heterogeneous ECC for clean/dirty blocks. DBI, with all three optimizations enabled, improves performance by 31% compared to the baseline (by 6% compared to the best previous mechanism) while reducing overall cache area cost by 8% compared to prior approaches.
FULL PAPER: pdf