Theses and Dissertations - UTB/UTPA

Date of Award

8-2010

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Dr. Artem Chebotko

Second Advisor

Dr. Richard Fowler

Third Advisor

Dr. Zhixiang Chen

Abstract

In scientific workflow environments, scientists depend on provenance, which records the history of an experiment. Resource Description Framework is frequently used to represent provenance based on vocabularies such as the Open Provenance Model. For complex scientific workflows that generate large amounts of RDF triples, single-machine provenance management becomes inadequate over time. In this thesis, we research how HBase capabilities can be leveraged for distributed storage and querying of provenance data represented in RDF. We architect the ProvBase system that incorporates an HBase/Hadoop backend, propose a storage schema to hold provenance triples, and design querying algorithms to evaluate SPARQL queries in the system. We conduct an experimental study to show the feasibility of our approach.

Comments

Copyright 2010 Jaime Alberto Navarro. All Rights Reserved.

https://www.proquest.com/dissertations-theses/distributed-storage-queryng-techniques-semantic/docview/760048428/se-2?accountid=7119

Granting Institution

University of Texas-Pan American

Share

COinS