Theses and Dissertations - UTB/UTPA
Date of Award
8-2010
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Dr. Artem Chebotko
Second Advisor
Dr. Richard Fowler
Third Advisor
Dr. Zhixiang Chen
Abstract
In scientific workflow environments, scientists depend on provenance, which records the history of an experiment. Resource Description Framework is frequently used to represent provenance based on vocabularies such as the Open Provenance Model. For complex scientific workflows that generate large amounts of RDF triples, single-machine provenance management becomes inadequate over time. In this thesis, we research how HBase capabilities can be leveraged for distributed storage and querying of provenance data represented in RDF. We architect the ProvBase system that incorporates an HBase/Hadoop backend, propose a storage schema to hold provenance triples, and design querying algorithms to evaluate SPARQL queries in the system. We conduct an experimental study to show the feasibility of our approach.
Granting Institution
University of Texas-Pan American
Comments
Copyright 2010 Jaime Alberto Navarro. All Rights Reserved.
https://www.proquest.com/dissertations-theses/distributed-storage-queryng-techniques-semantic/docview/760048428/se-2?accountid=7119