SEARCH
ISTC-CC NEWSLETTER
RESEARCH HIGHLIGHTS
Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
ISTC-CC provides a listing of useful benchmarks for cloud computing.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
ISTC-CC Abstract
Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems
Proceedings of the 43rd International Symposium on Computer Architecture (ISCA), Seoul, South Korea, June 18 - 22, 2016.
Kevin Hsieh‡, Eiman Ebrahimi†, Gwangsun Kim*, Niladrish Chatterjee†,
Mike O'Connor†, Nandita Vijaykumar‡, Onur Mutlu‡§, Stephen W. Keckler†
‡ Carnegie Mellon University
† NVIDIA
* KAIST
§ ETH Zürich
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to significantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance benefits from an architecture that connects multiple such 3D-stacked memories and offloads bandwidth-intensive computations to a GPU in each of the logic layers. An unsolved key challenge in such a system is how to enable computation ooading and data mapping to multiple 3D-stacked memories without burdening the programmer such that any application can transparently benefit from near-data processing capabilities in the logic layer.
Our paper develops two new mechanisms to address this key challenge. First, a compiler-based technique that automatically identifies code to ooad to a logic-layer GPU based on a simple cost-benefit analysis. Second, a software/hardware cooperative mechanism that predicts which memory pages will be accessed by ooaded code, and places those pages in the memory stack closest to the offloaded code, to minimize off-chip bandwidth consumption. We call the combination of these two programmer-transparent mechanisms TOM: Transparent Ooading and Mapping.
Our extensive evaluations across a variety of modern memoryintensive GPU workloads show that, without requiring any program modification, TOM significantly improves performance (by 30% on average, and up to 76%) compared to a baseline GPU system that cannot offload computation to 3D-stacked memories.
FULL PAPER: pdf