INTEL SCIENCE & TECHNOLOGY CENTER

CLOUD COMPUTING

ISTC-CC NEWSLETTER

The ISTC-CC Update 2016 - NEW!

The ISTC-CC Update 2015

The ISTC-CC Update 2014

RESEARCH HIGHLIGHTS

Ling Liu's SC13 paper "Large Graph Processing Without the Overhead" featured by HPCwire.

ISTC-CC provides a listing of useful benchmarks for cloud computing.

Another list highlighting Open Source Software Releases.

Second GraphLab workshop should be even bigger than the first! GraphLab is a new programming framework for graph-style data analytics.

Open-source Spark framework makes iterative and interactive data analytics FAST, both to run and to write.

ISTC-CC Abstract

Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning

Proceedings of the 40th IEEE International Conference on Very Large Databases (VLDB’14), Sept. 2014.

Kisung Lee, Ling Liu

Georgia Institute of Technology

Massive volumes of big RDF data are growing beyond the performance capacity of conventional RDF data management systems operating on a single node. Applications using large RDF data demand efficient data partitioning solutions for supporting RDF data access on a cluster of compute nodes. In this paper we present a novel semantic hash partitioning approach and implement a Semantic HAsh Partitioning-Enabled distributed RDF data management system, called Shape. This paper makes three original contributions. First, the semantic hash partitioning approach we propose extends the simple hash partitioning method through direction-based triple groups and direction-based triple replications. The latter enhances the former by controlled data replication through intelligent utilization of data access locality, such that queries over big RDF graphs can be processed with zero or very small amount of inter-machine communication cost. Second, we generate locality-optimized query execution plans that are more efficient than popular multi-node RDF data management systems by effectively minimizing the inter-machine communication cost for query processing. Third but not the least, we provide a suite of locality-aware optimization techniques to further reduce the partition size and cut down on the inter-machine communication cost during distributed query processing. Experimental results show that our system scales well and can process big RDF datasets more efficiently than existing approaches.

FULL PAPER: pdf