Articles via Databases
Articles via Journals
Online Catalog
E-books
Research & Information Literacy
Interlibrary loan
Theses & Dissertations
Collections
Policies
Services
About / Contact Us
Administration
Littman Architecture Library
This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/922 in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Processing techniques for partial tree-pattern queries on XML data
Author: Placek, Pawel
View Online: njit-etd2009-074
(xiii, 116 pages ~ 5.2 MB pdf)
Department: Department of Computer Science
Degree: Doctor of Philosophy
Program: Computer Science
Document Type: Dissertation
Advisory Committee: Theodoratos, Dimitri (Committee chair)
Gehani, Narain (Committee member)
Geller, James (Committee member)
Halper, Michael (Committee member)
Oria, Vincent (Committee member)
Date: 2009-08
Keywords: Tree-structured data
Query minimization
Partial tree-pattern
Index graph
Query containment
Query processing
Availability: Unrestricted
Abstract:

In recent years, eXtensible Markup Language (XML) has become a de facto standard for exporting and exchanging data on the Web. XML structures data as trees. Querying capabilities are provided through patterns matched against the XML trees. Research on the processing of XML queries has focused mainly on tree-pattern queries. Tree-pattern queries are not appropriate for querying XML data sources whose structure is not fully known to the user, or for querying multiple data sources which structure information differently. Recently, a class of queries, called Partial Tree-Pattern Queries (PTPQs) was identified. A central feature of PTPQs is that the structure can be specified fully, partially, or not at all in a query. For this reason. PTPQs can be used for flexibly querying XML data sources.

This thesis deals with processing techniques for PTPQs. In particular, it addresses the satisfiability, containment and minimization problems for PTPQs. In order to cope with structural expression derivation issues and to compare PTPQs, a set of inference rules is suggested and a canonical form for PTPQs that comprises all derived structural expressions is defined. This canonical form is used for determining necessary and sufficient conditions for PTPQ satisfiability.

The containment problem is studied both in the absence and in the presence of structural summaries of data called dimension graphs. It is shown that this problem cannot be characterized by homomorphisms between PTPQs, even when PTPQs are put in canonical form. In both cases of the problem, necessary and sufficient conditions for PTPQ containment are provided in terms of homomorphisms between PTPQs and (a possibly exponential number of) tree-pattern queries. This result is used to identify a subclass of PTPQs that strictly contains tree-pattern queries for which the containment problem can be fully characterized through the existence of homomorphisms. To cope with the high complexity of PTPQ containment, heuristic approaches for this problem are designed that trade accuracy for speed. The heuristic approaches equivalently add structural expressions to PTPQs in order to increase the possibility for a homomorphism between two contained PTPQs to exist. An implementation and extensive experimental evaluation of these heuristics shows that they are useful in practice, and that they can be efficiently implemented in a query optimizer.

The goal of PTPQ minimization is to produce an equivalent PTPQ which is syntactically smaller in size. This problem is studied in the absence of structural summaries. It is shown that PTPQs cannot be minimized by removing redundant parts as is the case with certain classes of tree-pattern queries. It is also shown that, in general, a PTPQ does not have a unique minimal equivalent PTPQ. Finally, sound, but not complete, heuristic approaches for PTPQ minimization are presented. These approaches gradually trade execution time for accuracy.


If you have any questions please contact the ETD Team, libetd@njit.edu.

 
ETD Information
Digital Commons @ NJIT
Theses and DIssertations
ETD Policies & Procedures
ETD FAQ's
ETD home

Request a Scan
NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003