Search results
1 – 10 of 10Taro Aso, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to propose a scheme that allows users to interactively explore relations between entities in knowledge bases (KBs). KBs store a wide range of…
Abstract
Purpose
The purpose of this paper is to propose a scheme that allows users to interactively explore relations between entities in knowledge bases (KBs). KBs store a wide range of knowledge about real-world entities in a structured form as (subject, predicate, object). Although it is possible to query entities and relations among entities by specifying appropriate query expressions of SPARQL or keyword queries, the structure and the vocabulary are complicated, and it is hard for non-expert users to get the desired information. For this reason, many researchers have proposed faceted search interfaces for KBs. Nevertheless, existing ones are designed for finding entities and are insufficient for finding relations.
Design/methodology/approach
To this problem, the authors propose a novel “relation facet” to find relations between entities. To generate it, they applied clustering on predicates for grouping those predicates that are connected to common objects. Having generated clusters of predicates, the authors generated a facet according to the result. Specifically, they proposed to use a couple of clustering algorithms, namely, agglomerative hierarchical clustering (AHC) and CANDECOMP/PARAFAC (CP) tensor decomposition which is one of the tensor decomposition methods.
Findings
The authors experimentally show test the performance of clustering methods and found that AHC performs better than tensor decomposition. Besides, the authors conducted a user study and show that their proposed scheme performs better than existing ones in the task of searching relations.
Originality/value
The authors propose a relation-oriented faceted search method for KBs that allows users to explore relations between entities. As far as the authors know, this is the first method to focus on the exploration of relations between entities.
Details
Keywords
Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa
Linked data (LD) has promoted publishing information, and links published information. There are increasing number of LD datasets containing numerical data such as statistics. For…
Abstract
Purpose
Linked data (LD) has promoted publishing information, and links published information. There are increasing number of LD datasets containing numerical data such as statistics. For this reason, analyzing numerical facts on LD has attracted attentions from diverse domains. This paper aims to support analytical processing for LD data.
Design/methodology/approach
This paper proposes a framework called H-SPOOL which provides series of SPARQL (SPARQL Protocol and RDF Query Language) queries extracting objects and attributes from LD data sets, converts them into star/snowflake schemas and materializes relevant triples as fact and dimension tables for online analytical processing (OLAP).
Findings
The applicability of H-SPOOL is evaluated using exiting LD data sets on the Web, and H-SPOOL successfully processes the LD data sets to ETL (Extract, Transform, and Load) for OLAP. Besides, experiments show that H-SPOOL reduces the number of downloaded triples comparing with existing approach.
Originality/value
H-SPOOL is the first work for extracting OLAP-related information from SPARQL endpoints, and H-SPOOL drastically reduces the amount of downloaded triples.
Details
Keywords
Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa
XML has become a standard data format for many applications and efficient retrieval methods are required. Typically, there are roughly two kinds of retrieval methods, namely…
Abstract
Purpose
XML has become a standard data format for many applications and efficient retrieval methods are required. Typically, there are roughly two kinds of retrieval methods, namely path‐based method (e.g. XPath and XQuery) and keyword search, but these methods do not work when users do not have any concrete information need. To expand feasibility of XML data retrieval is an important task and this is the purpose of this paper.
Design/methodology/approach
The paper's strategy is to apply faceted navigation for XML data. Faceted navigation is an exploratory search which enables the exploration of data making use of attributes, called facets. General faceted navigation methods are applied for attributed objects but XML data have no criteria because XML nodes are objects and facets. Thus, the paper's approach is to construct a framework to enable faceted navigation over XML data. It first extracts objects based on occurrence of nodes and facets. Then it constructs a faceted navigation interface for extracted objects and facets.
Findings
The framework achieves semi‐automatic construction of faceted navigation interface from an XML database. In the experiments, the show feasibility of the framework is shown by three faceted navigation interfaces using existing real XML data. On the other hand, the user study shows the retrieval method helps users to find required information.
Originality/value
There are only a few works which apply faceted navigation for XML data and these works are based on predefined objects and facets which need human effort. In contrast, this framework needs human decision making only when choosing objects and facets to be used in the faceted navigation interface.
Details
Keywords
Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig…
Abstract
Purpose
The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).
Design/methodology/approach
GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.
Findings
GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.
Research limitations/implications
The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.
Practical implications
Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.
Originality/value
This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.
Details
Keywords
Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find…
Abstract
Purpose
The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find necessary information from XML data by giving attribute–content pairs (called facet-value pair) about the current search results. However, if most of the contents of a facet have longer texts in average (such facets are called textual facets), it is not easy to overview the current results.
Design/methodology/approach
The proposed approach is based upon subsumption relationships of terms among the contents of a facet. The subsumption relationship can be extracted using co-occurrences of terms among a number of documents (in this paper, a content of a facet is considered as a document). Subsumption relationships compose hierarchies, and the authors utilize the hierarchies to extract facet-values from textual facets. In the faceted search context, users have ambiguous search demands, they expect broader terms. Thus, we extract high-level terms in the hierarchies as facet-values.
Findings
The main findings of this paper are the extracted terms improve users’ search experiences, especially in cases when the search demands are ambiguous.
Originality/value
An originality of this paper is the way to utilize the textual contents of XML data for improving users’ search experiences on faceted search. The other originality is how to design the tasks to evaluate exploratory search like faceted search.
Details
Keywords
Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to propose general parallelism techniques for holistic twig join algorithms to process queries against Extensible Markup Language (XML) databases on a…
Abstract
Purpose
The purpose of this paper is to propose general parallelism techniques for holistic twig join algorithms to process queries against Extensible Markup Language (XML) databases on a multi‐core system.
Design/methodology/approach
The parallelism techniques comprised data and task parallelism. As for data parallelism, the paper adopted the stream‐based partitioning for XML to partition XML data as the basis of parallelism on multiple CPU cores. The XML data partitioning was performed in two levels. The first level was to create buckets for creating data independence and balancing loads among CPU cores; each bucket was assigned onto a CPU core. Within each bucket, the second level of XML data partitioning was performed to create finer partitions for providing finer parallelism. Each CPU core performed the holistic twig join algorithm on each finer partition of its own in parallel with other CPU cores. In task parallelism, the holistic twig join algorithm was decomposed into two main tasks, which were pipelined to create parallelism. The first task adopted the data parallelism technique and their outputs were transferred to the second task periodically. Since data transfers incurred overheads, the size of each data transfer needed to be estimated cautiously for achieving optimal performance.
Findings
The data and task parallelism techniques contribute to good performance especially for queries having complex structures and/or higher values of query selectivity. The performance of data parallelism can be further improved by task parallelism. Significant performance improvement is attained by queries having higher selectivity because more outputs computed by the second task is performed in parallel with the first task.
Research limitations/implications
The proposed parallelism techniques primarily deals with executing a single long‐running query for intra‐query parallelism, partitioning XML data on‐the‐fly, and allocating partitions on CPU cores statically. During the parallel execution, presumably there are no such dynamic XML data updates.
Practical implications
The effectiveness of the proposed parallel holistic twig joins relies fundamentally on some system parameter values that can be obtained from a benchmark of the system platform.
Originality/value
The paper proposes novel techniques to increase parallelism by combining techniques of data and task parallelism for achieving high performance. To the best of the author's knowledge, this is the first paper of parallelizing the holistic twig join algorithms on a multi‐core system.
Details
Keywords
Chantola Kit, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to propose efficient algorithms for structural grouping over Extensible Markup Language (XML) data, called TOPOLOGICAL ROLLUP (T‐ROLLUP), which are to…
Abstract
Purpose
The purpose of this paper is to propose efficient algorithms for structural grouping over Extensible Markup Language (XML) data, called TOPOLOGICAL ROLLUP (T‐ROLLUP), which are to compute aggregation functions based on XML data with multiple hierarchical levels. They play important roles in the online analytical processing of XML data, called XML‐OLAP, with which complex analysis over XML can be performed to discover valuable information from XML.
Design/methodology/approach
Several variations of algorithms are proposed for efficient T‐ROLLUP computation. First, two basic algorithms, top‐down algorithm (TDA) and bottom‐up algorithm (BUA), are presented in which the well‐known structural‐join algorithms are used. The paper then proposes more efficient algorithms, called single‐scan by preorder number and single‐scan by postorder number (SSC‐Pre/Post), which are also based on structural joins, but have been modified from the basic algorithms so that multiple levels of grouping are computed with a single scan over node lists. In addition, the paper attempts to adopt the algorithm for parallel execution in multi‐core environments.
Findings
Several experiments are conducted with XMark and synthetic XML data to show the effectiveness of the proposed algorithms. The experiments show that proposed algorithms perform much better than the naïve implementation. In particular, the proposed SSC‐Pre and SSC‐Post perform better than TDA and BUA for all cases. Beyond that, the experiment using the parallel single scan algorithm also shows better performance than the ordinary basic algorithm.
Research limitations/implications
This paper focuses on the T‐ROLLUP operation for XML data analysis. For this reason, other operations related to XML‐OLAP, such as CUBE, WINDOWING, and RANKING should also be investigated.
Originality/value
The paper presents an extended version of one of the award winning papers at iiWAS2008.
Details
Keywords
Shohei Ohsawa, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to improve the performance of querying and reasoning and querying over large‐scale Resource Description Framework (RDF) data. When processing RDF(S…
Abstract
Purpose
The purpose of this paper is to improve the performance of querying and reasoning and querying over large‐scale Resource Description Framework (RDF) data. When processing RDF(S) data, RDFS entailment is performed which often generates a large number of additional triples, which causes a poor performance. To deal with large‐scale RDF data, it is important to develop a scheme which enables the processing of large RDF data in an efficient manner.
Design/methodology/approach
The authors propose RDF packages, which is a space efficient format for RDF data. In an RDF package, a set of triples of the same class or triples having the same predicate are grouped into a dedicated node named Package. Any RDF data can be represented using RDF packages, and vice versa.
Findings
It is found that using RDF packages can significantly reduce the size of RDF data, even after RDFS entailment. The authors experimentally evaluate the performance of the proposed scheme in terms of triple size, reasoning speed, and querying speed.
Research limitations/implications
The proposed scheme is useful in processing RDF(S) data, but it needs further development to deal with an ontological language such as OWL.
Originality/value
An important feature of the RDF packages is that, when performing RDFS reasoning, there is no need to modify either reasoning rules or reasoning engine; while other related schemes require reasoning rules or reasoning engine to be modified.
Details
Keywords
Savong Bou, Toshiyuki Amagasa and Hiroyuki Kitagawa
In purpose of this paper is to propose a novel scheme to process XPath-based keyword search over Extensible Markup Language (XML) streams, where one can specify query keywords and…
Abstract
Purpose
In purpose of this paper is to propose a novel scheme to process XPath-based keyword search over Extensible Markup Language (XML) streams, where one can specify query keywords and XPath-based filtering conditions at the same time. Experimental results prove that our proposed scheme can efficiently and practically process XPath-based keyword search over XML streams.
Design/methodology/approach
To allow XPath-based keyword search over XML streams, it was attempted to integrate YFilter (Diao et al., 2003) with CKStream (Hummel et al., 2011). More precisely, the nondeterministic finite automation (NFA) of YFilter is extended so that keyword matching at text nodes is supported. Next, the stack data structure is modified by integrating set of NFA states in YFilter with bitmaps generated from set of keyword queries in CKStream.
Findings
Extensive experiments were conducted using both synthetic and real data set to show the effectiveness of the proposed method. The experimental results showed that the accuracy of the proposed method was better than the baseline method (CKStream), while it consumed less memory. Moreover, the proposed scheme showed good scalability with respect to the number of queries.
Originality/value
Due to the rapid diffusion of XML streams, the demand for querying such information is also growing. In such a situation, the ability to query by combining XPath and keyword search is important, because it is easy to use, but powerful means to query XML streams. However, none of existing works has addressed this issue. This work is to cope with this problem by combining an existing XPath-based YFilter and a keyword-search-based CKStream for XML streams to enable XPath-based keyword search.
Details