SEARCH
ISTC-CC NEWSLETTER
RESEARCH HIGHLIGHTS
Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
ISTC-CC provides a listing of useful benchmarks for cloud computing.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
ISTC-CC Abstract
Reliability Implications of Power/Thermal Constrained
Operation in Asymmetric Multicore Processors
Dark Silicon Workshop, June 2012, Portland, Oregon USA. Co-located with ISCA 2012.
William J. Song, Saibal Mukhopadhyay, Sudhakar Yalamanchili
Georgia Institute of Technology
The emergence of the dark silicon era raises new issues in balancing performance, utilization, and reliability. Power and thermal constraints preclude core scaling according to the business-as-usual progression of Moore's Law [1]. Such constraints invoke the utilization wall [2] where the number of active cores is limited, and hence the silicon resources on the die are underutilized and operate below their full switching capacity; dark silicon [1, 2, 3, 4]. The consequence is a drag on the performance growth for future processors [1, 2, 3].
A range of techniques have emerged to address the issue of dark silicon including i) the use of heterogeneous and/or asymmetric architectures often including specialized cores to deliver optimized energy/area for specific functions [2, 3], ii) dynamic voltage-frequency scaling (DVFS) [4], and iii) systematic techniques for power gating such as turbo-boost or computational sprinting [5]. While the focus of these and other similar efforts has been on managing energy/power/performance tradeoffs, little attention has been paid to the impact of these management techniques on processor reliability. The application of DVFS and power gating techniques have a complicated impact on device and hence core and processor degradation. For example, devices that are power-gated off experience some degree of regeneration enabling limited recovery from thermal and electrical stresses; electromigration, negative bias temperature instability, stress migration, time dependent dielectric breakdown, thermal cycling [7], etc.
In this presentation, we focus on asymmetric multicore processors (AMPs) and the reliability impact of three management techniques; i) computational sprinting, ii) DVFS, and iii) low voltage operation. The computational sprinting [5] (alternatively known as race-to-idle or turbo-boost) accelerates the execution of cores by increasing voltage and frequency levels. It is followed by an idle period that turns off the cores. The leakage power savings in the idle period initially motivated this technique. However, here the power savings from idle cores are used to boost the execution of the active cores. The use of sprinting stresses different cores (i.e., out-of-order vs. in-order) in different ways, as do different workloads. The result is that cores degrade at different rates, which can lead to an overall reduction in lifetime reliability. Similarly, the use of DVFS or sustained low voltage operation has a different impact on the degradation (and regenerative ability) of devices/cores. Consequently management techniques such as computational sprinting are not just techniques for extracting performance under thermal and power constraints. Rather, there are also a choice of tradeoffs between performance and lifetime reliability. We argue that time-multiplexed operation of cores (e.g., power gating) must be orchestrated keeping reliability impact in mind.
FULL EXTENDED ABSTRACT: pdf
PRESENTATION: pdf