Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
Leveraging Endpoint Flexibility in Data-intensive Clusters
ACM SIGCOMM 2013 Conference (SIGCOMM’13), August 2013.
Mosharaf Chowdhury, Srikanth Kandula*, Ion Stoica
Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers, as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the ?flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of all cross-rack traffi?c in data-intensive clusters. The replicas of a CFS write can be placed in any subset of machines as long as they are in multiple fault domains and ensure a balanced use of storage throughout the cluster.
We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad -- a system that identifies imbalance and adapts replica destinations to navigate around congested links. Experiments on EC2 and trace-driven simulations show that block writes complete 1.3× (respectively, 1.58?×) faster as the network becomes more balanced. As a collateral benefit, end-to-end completion times of data-intensive jobs improve as well. Sinbad does so with little impact on the long-term storage balance.
FULL PAPER: pdf