XML data partitioning schemes for parallel holistic twig joins
International Journal of Web Information Systems
ISSN: 1744-0084
Article publication date: 19 June 2009
Abstract
Purpose
The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).
Design/methodology/approach
GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.
Findings
GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.
Research limitations/implications
The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.
Practical implications
Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.
Originality/value
This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.
Keywords
Citation
Machdi, I., Amagasa, T. and Kitagawa, H. (2009), "XML data partitioning schemes for parallel holistic twig joins", International Journal of Web Information Systems, Vol. 5 No. 2, pp. 151-194. https://doi.org/10.1108/17440080910968445
Publisher
:Emerald Group Publishing Limited
Copyright © 2009, Emerald Group Publishing Limited