INTEL SCIENCE & TECHNOLOGY CENTER

CLOUD COMPUTING

ISTC-CC NEWSLETTER

The ISTC-CC Update 2016 - NEW!

The ISTC-CC Update 2015

The ISTC-CC Update 2014

RESEARCH HIGHLIGHTS

Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.

ISTC-CC provides a listing of useful benchmarks for cloud computing.

Another list highlighting Open Source Software Releases.

Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.

Open-source Spark framework makes iterative and interactive data analytics FAST, both to run and to write.

ISTC-CC Abstract

Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories

ACM Transactions on Architecture and Code Optimization (TACO), Vol. 11, No. 4, December 2014. Best (student) presentation award.

HanBin Yoon, Justin Meza, Naveen Muralimanohar*, Norman P. Jouppi†,
Onur Mutlu

Carnegie Mellon University
* Hewlett-Packard Labs
† Google Inc.

New phase-change memory (PCM) devices have low-access latencies (like DRAM) and high capacities (i.e., low cost per bit, like Flash). In addition to being able to scale to smaller cell sizes than DRAM, a PCM cell can also store multiple bits per cell (referred to as multilevel cell, or MLC), enabling even greater capacity per bit. However, reading and writing the different bits of data from and to an MLC PCM cell requires different amounts of time: one bit is read or written first, followed by another. Due to this asymmetric access process, the bits in an MLC PCM cell have different access latency and energy depending on which bit in the cell is being read or written.

We leverage this observation to design a new way to store and buffer data in MLC PCM devices. While traditional devices couple the bits in each cell next to one another in the address space, our key idea is to logically decouple the bits in each cell into two separate regions depending on their read/write characteristics: fast-read/slow-write bits and slow-read/fast-write bits. We propose a low-overhead hardware/software technique to predict and map data that would benefit from being in each region at runtime. In addition, we show how MLC bit decoupling provides more flexibility in the way data is buffered in the device, enabling more efficient use of existing device buffer space.

Our evaluations for a multicore system show that MLC bit decoupling improves system performance by 19.2%, memory energy efficiency by 14.4%, and thread fairness by 19.3% over a state-of-the-art MLC PCM system that couples the bits in its cells.We show that our results are consistent across a variety of workloads and system configurations.

FULL PAPER: pdf