SEARCH
ISTC-CC NEWSLETTER
RESEARCH HIGHLIGHTS
Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
ISTC-CC provides a listing of useful benchmarks for cloud computing.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
ISTC-CC Abstract
Record Placement Based on Data Skew Using Solid State Drives
Proceedings of Fifth workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BPOE’14) at VLDB, September 2014.
Jun Suzuki, Shivaram Venkataraman*, Sameer Agarwal*, Michael J. Franklin*,
Ion Stoica*
Green Platform Research Laboratories, NEC, Kawasaki, Japan
* University of California, Berkeley
Integrating a solid state drive (SSD) into a data store is expected to improve its I/O performance. However, there is still a large difference between the price of an SSD and a hard-disk drive (HDD). One of the methods to offset the increase in cost of consisting devices is to configure a hybrid system using both devices. In such a system, a common method to decide the placement of data records is based on reference locality, i.e., placing the frequently accessed records in a faster SSD. In this paper, we propose an alternative that focuses on data skew by storing records with values that appear less often in an SSD while those that do more in an HDD. As we will show, this enhances the performance of fetching records using multi-dimensional indices. When records are fetched using one of the indices targeted for optimization, records stored in an SSD are likely be retrieved using random access, while those stored in an HDD using sequential access. Given the method does not rely on reference locality, its performance is stable between first and second accesses and it provides a performance gain even when a host memory is large enough to contain the entire working set of the application. Our implementation and experiments show that storing just 20% records in an SSD achieves up to 76% of the maximum reduction that would otherwise be obtained when all the records are stored in an SSD.
FULL PAPER: pdf