SEARCH
ISTC-CC NEWSLETTER
RESEARCH HIGHLIGHTS
Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.
ISTC-CC provides a listing of useful benchmarks for cloud computing.
Another list highlighting Open Source Software Releases.
Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.
ISTC-CC Abstract
From TPC-C to Big Data Benchmarks: A Functional Workload Model
Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2012-174, July 2012.
Yanpei Chen, Francois Raab*, and Randy Katz^
Cloudera & UC Berkeley
* InfoSizing
^ UC Berkeley
Big data systems help organizations store, manipulate, and derive value from vast amounts of data. Relational database and MapRe- duce are two, arguably competing implementations of such systems. They are characterized by very large data volumes, diverse unconventional data types and complex data analysis functions. These properties make it challenging to develop big data benchmarks that re ect real life use cases and cover multiple types of implementation options. In this position paper, we combine experiences from the TPC-C benchmark with emerg- ing insights from MapReduce application domains to argue for using a model based on functions of abstraction to construct future benchmarks for big data systems. In particular, this model describes several com- ponents of the targeted workloads: the functional goals that the system must achieve, the representative data access patterns, the scheduling and load variations over time, and the computation required to achieve the functional goals. We show that the TPC-C benchmark already applies such a model to benchmarking transactional systems. A similar model can be developed for other big data systems, such as MapReduce, once additional empirical studies are performed. Identifying the functions of abstraction for a big data application domain represents the rst step towards building truly representative big data benchmarks.
FULL PAPER: pdf