Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
A Top-Down Parallel Semisort
ACM Symposium on Parallelism in Algorithms and Architecture. SPAA 2015,
June 13 - 15, 2015.
Yan Gu, Julian Shun, Yihan Sun, Guy Blelloch
Carnegie Mellon University
Semisorting is the problem of reordering an input array of keys such that equal keys are contiguous but different keys are not necessarily in sorted order. Semisorting is important for collecting equal values and is widely used in practice. For example, it is the core of the MapReduce paradigm, is a key component of the database join operation, and has many other applications.
We describe a (randomized) parallel algorithm for the problem that is theoretically efficient (linear work and log- arithmic depth), but is designed to be more practically efficient than previous algorithms. We use ideas from the parallel integer sorting algorithm of Rajasekaran and Reif, but instead of processing bits of a integers in a reduced range in a bottom-up fashion, we process the hashed values of keys directly top-down. We implement the algorithm and experimentally show on a variety of input distributions that it outperforms a similarly-optimized radix sort on a modern 40-core machine with hyper-threading by about a factor of 1.7-1.9, and achieves a parallel speedup of up to 38x. We discuss the various optimizations used in our implementa- tion and present an extensive experimental analysis of its performance.
FULL PAPER: pdf