Search results

1 – 1 of 1
Article
Publication date: 31 December 2006

Hooman Homayounfar and Fangju Wang

XML is becoming one of the most important structures for data exchange on the web. Despite having many advantages, XML structure imposes several major obstacles to large document…

Abstract

XML is becoming one of the most important structures for data exchange on the web. Despite having many advantages, XML structure imposes several major obstacles to large document processing. Inconsistency between the linear nature of the current algorithms (e.g. for caching and prefetch) used in operating systems and databases, and the non‐linear structure of XML data makes XML processing more costly. In addition to verbosity (e.g. tag redundancy), interpreting (i.e. parsing) depthfirst (DF) structure of XML documents is a significant overhead to processing applications (e.g. query engines). Recent research on XML query processing has learned that sibling clustering can improve performance significantly. However, the existing clustering methods are not able to avoid parsing overhead as they are limited by larger document sizes. In this research, We have developed a better data organization for native XML databases, named sibling‐first (SF) format that improves query performance significantly. SF uses an embedded index for fast accessing to child nodes. It also compresses documents by eliminating extra information from the original DF format. The converted SF documents can be processed for XPath query purposes without being parsed. We have implemented the SF storage in virtual memory as well as a format on disk. Experimental results with real data have showed that significantly higher performance can be achieved when XPath queries are conducted on very large SF documents.

Details

International Journal of Web Information Systems, vol. 2 no. 3/4
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 1 of 1