SEARCH
ISTC-CC NEWSLETTER
RESEARCH HIGHLIGHTS
Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
ISTC-CC provides a listing of useful benchmarks for cloud computing.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
ISTC-CC Abstract
Memory-Efficient GroupBy-Aggregate using Compressed
Buffer Trees
4th ACM Symposium on Cloud Computing (SOCC’13), October 2013.
Hrishikesh Amur†, Wolfgang Richter, David G. Andersen, Michael Kaminsky‡, Karsten Schwan†, Athula Balanachandran, and Erik Zawadzki
Carnegie Mellon University
†Georgia Institute of Technology
‡Intel Labs
The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed data structure called a Compressed Buffer Tree (CBT). Using a combination of techniques including buffering, compression, and serialization, CBTs improve the memory efficiency and performance of the GroupBy-Aggregate abstraction that forms the basis of not only batch-processing models like MapReduce, but recent fast analytics systems too. For streaming workloads, aggregation using the CBT uses 21-42% less memory than using Google SparseHash with up to 16% better throughput. The CBT is also compared to batch-mode aggregators in MapReduce runtimes such as Phoenix++ and Metis and consumes 4 and 5 less memory with 1.5-2 and 3-4 more performance respectively.
FULL PAPER: pdf