Theses and Dissertations - UTB/UTPA
Distributed storage and queryng techniques for a semantic web of scientific workflow provenance
Date of Award
Master of Science (MS)
Dr. Artem Chebotko
Dr. Richard Fowler
Dr. Zhixiang Chen
In scientific workflow environments, scientists depend on provenance, which records the history of an experiment. Resource Description Framework is frequently used to represent provenance based on vocabularies such as the Open Provenance Model. For complex scientific workflows that generate large amounts of RDF triples, single-machine provenance management becomes inadequate over time. In this thesis, we research how HBase capabilities can be leveraged for distributed storage and querying of provenance data represented in RDF. We architect the ProvBase system that incorporates an HBase/Hadoop backend, propose a storage schema to hold provenance triples, and design querying algorithms to evaluate SPARQL queries in the system. We conduct an experimental study to show the feasibility of our approach.
University of Texas-Pan American
Copyright 2010 Jaime Alberto Navarro. All Rights Reserved.