To read this content please select one of the options below:

(excl. tax) 30 days to view and download

XML data partitioning schemes for parallel holistic twig joins

Imam Machdi, Toshiyuki Amagasa, Hiroyuki Kitagawa

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 19 June 2009

Downloads

296

Abstract

Purpose

–

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).

Design/methodology/approach

–

GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.

Findings

–

GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.

Research limitations/implications

–

The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.

Practical implications

–

Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.

Originality/value

–

This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.

Keywords

Citation

Machdi, I., Amagasa, T. and Kitagawa, H. (2009), "XML data partitioning schemes for parallel holistic twig joins", International Journal of Web Information Systems, Vol. 5 No. 2, pp. 151-194. https://doi.org/10.1108/17440080910968445

Publisher

:

Emerald Group Publishing Limited

To read this content please select one of the options below:

Please note you do not have access to teaching notes

XML data partitioning schemes for parallel holistic twig joins

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Keywords

Citation

Publisher

Related articles

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Keywords

Citation

Publisher

Related articles

All feedback is valuable

Report an issue or find answers to frequently asked questions