Bruno C.N. Oliveira, Alexis Huf, Ivan Luiz Salvadori and Frank Siqueira
This paper describes a software architecture that automatically adds semantic capabilities to data services. The proposed architecture, called OntoGenesis, is able to semantically…
Abstract
Purpose
This paper describes a software architecture that automatically adds semantic capabilities to data services. The proposed architecture, called OntoGenesis, is able to semantically enrich data services, so that they can dynamically provide both semantic descriptions and data representations.
Design/methodology/approach
The enrichment approach is designed to intercept the requests from data services. Therefore, a domain ontology is constructed and evolved in accordance with the syntactic representations provided by such services in order to define the data concepts. In addition, a property matching mechanism is proposed to exploit the potential data intersection observed in data service representations and external data sources so as to enhance the domain ontology with new equivalences triples. Finally, the enrichment approach is capable of deriving on demand a semantic description and data representations that link to the domain ontology concepts.
Findings
Experiments were performed using real-world datasets, such as DBpedia, GeoNames as well as open government data. The obtained results show the applicability of the proposed architecture and that it can boost the development of semantic data services. Moreover, the matching approach achieved better performance when compared with other existing approaches found in the literature.
Research limitations/implications
This work only considers services designed as data providers, i.e., services that provide an interface for accessing data sources. In addition, our approach assumes that both data services and external sources – used to enhance the domain ontology – have some potential of data intersection. Such assumption only requires that services and external sources share particular property values.
Originality/value
Unlike most of the approaches found in the literature, the architecture proposed in this paper is meant to semantically enrich data services in such way that human intervention is minimal. Furthermore, an automata-based index is also presented as a novel method that significantly improves the performance of the property matching mechanism.
Details
Keywords
Ivan Luiz Salvadori, Alexis Huf, Bruno C.N. Oliveira, Ronaldo dos Santos Mello and Frank Siqueira
This paper aims to propose a method based on Linked Data and Semantic Web principles for composing microservices through data integration. Two frameworks that provide support for…
Abstract
Purpose
This paper aims to propose a method based on Linked Data and Semantic Web principles for composing microservices through data integration. Two frameworks that provide support for the proposed composition method are also described in this paper: Linkedator, which is responsible for connecting entities managed by microservices, and Alignator, which aligns semantic concepts defined by heterogeneous ontologies.
Design/methodology/approach
The proposed method is based on entity linking principles and uses individual matching techniques considering a formal notion of identity. This method imposes two major constraints that must be taken into account by its implementation: architectural constraints and resource design constraints.
Findings
Experiments were performed in a real-world scenario, using public government data. The obtained results show the effectiveness of the proposed method and that, it leverages the independence of development and composability of microservices. Thereby, the data provided by microservices that adopt heterogeneous ontologies can now be linked together.
Research limitations/implications
This work only considers microservices designed as data providers. Microservices designed to execute functionalities in a given application domain are out of the scope of this work.
Originality/value
The proposed composition method exploits the potential data intersection observed in resource-oriented microservice descriptions, providing a navigable view of data provided by a set of interrelated microservices. Furthermore, this study explores the applicability of ontology alignments for composing microservices.
Details
Keywords
Bojan Bozic, Andre Rios and Sarah Jane Delany
This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors…
Abstract
Purpose
This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors demonstrate the usage of methods based on hotel staff inputs in a ticketing system as well as the publicly available StackOverflow corpus. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry.
Design/methodology/approach
The paper consists of two parts: exploration of existing sample data, which includes statistical analysis and visualisation of the data to provide an overview, and evaluation of tag prediction approaches. The authors have included different approaches from different research fields to cover a broad spectrum of possible solutions. As a result, the authors have tested a machine learning model for multi-label classification (using gradient boosting), a statistical approach (using frequency heuristics) and three similarity-based classification approaches (nearest centroid, k-nearest neighbours (k-NN) and naive Bayes). The experiment that compares the approaches uses recall to measure the quality of results. Finally, the authors provide a recommendation of the modelling approach that produces the best accuracy in terms of tag prediction on the sample data.
Findings
The authors have calculated the performance of each method against the test data set by measuring recall. The authors show recall for each method with different features (except for frequency heuristics, which does not provide the option to add additional features) for the dmbook pro and StackOverflow data sets. k-NN clearly provides the best recall. As k-NN turned out to provide the best results, the authors have performed further experiments with values of k from 1–10. This helped us to observe the impact of the number of neighbours used on the performance and to identify the best value for k.
Originality/value
The value and originality of the paper are given by extensive experiments with several methods from different domains. The authors have used probabilistic methods, such as naive Bayes, statistical methods, such as frequency heuristics, and similarity approaches, such as k-NN. Furthermore, the authors have produced results on an industrial-scale data set that has been provided by a company and used directly in their project, as well as a community-based data set with a large amount of data and dimensionality. The study results can be used to select a model based on diverse corpora for a specific use case, taking into account advantages and disadvantages when applying the model to your data.