Search results

1 – 3 of 3
Article
Publication date: 11 May 2020

Bojan Bozic, Andre Rios and Sarah Jane Delany

This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors…

Abstract

Purpose

This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors demonstrate the usage of methods based on hotel staff inputs in a ticketing system as well as the publicly available StackOverflow corpus. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry.

Design/methodology/approach

The paper consists of two parts: exploration of existing sample data, which includes statistical analysis and visualisation of the data to provide an overview, and evaluation of tag prediction approaches. The authors have included different approaches from different research fields to cover a broad spectrum of possible solutions. As a result, the authors have tested a machine learning model for multi-label classification (using gradient boosting), a statistical approach (using frequency heuristics) and three similarity-based classification approaches (nearest centroid, k-nearest neighbours (k-NN) and naive Bayes). The experiment that compares the approaches uses recall to measure the quality of results. Finally, the authors provide a recommendation of the modelling approach that produces the best accuracy in terms of tag prediction on the sample data.

Findings

The authors have calculated the performance of each method against the test data set by measuring recall. The authors show recall for each method with different features (except for frequency heuristics, which does not provide the option to add additional features) for the dmbook pro and StackOverflow data sets. k-NN clearly provides the best recall. As k-NN turned out to provide the best results, the authors have performed further experiments with values of k from 1–10. This helped us to observe the impact of the number of neighbours used on the performance and to identify the best value for k.

Originality/value

The value and originality of the paper are given by extensive experiments with several methods from different domains. The authors have used probabilistic methods, such as naive Bayes, statistical methods, such as frequency heuristics, and similarity approaches, such as k-NN. Furthermore, the authors have produced results on an industrial-scale data set that has been provided by a company and used directly in their project, as well as a community-based data set with a large amount of data and dimensionality. The study results can be used to select a model based on diverse corpora for a specific use case, taking into account advantages and disadvantages when applying the model to your data.

Details

International Journal of Web Information Systems, vol. 16 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 14 June 2013

Bojan Božić and Werner Winiwarter

The purpose of this paper is to present a showcase of semantic time series processing which demonstrates how this technology can improve time series processing and community…

Abstract

Purpose

The purpose of this paper is to present a showcase of semantic time series processing which demonstrates how this technology can improve time series processing and community building by the use of a dedicated language.

Design/methodology/approach

The authors have developed a new semantic time series processing language and prepared showcases to demonstrate its functionality. The assumption is an environmental setting with data measurements from different sensors to be distributed to different groups of interest. The data are represented as time series for water and air quality, while the user groups are, among others, the environmental agency, companies from the industrial sector and legal authorities.

Findings

A language for time series processing and several tools to enrich the time series with meta‐data and for community building have been implemented in Python and Java. Also a GUI for demonstration purposes has been developed in PyQt4. In addition, an ontology for validation has been designed and a knowledge base for data storage and inference was set up. Some important features are: dynamic integration of ontologies, time series annotation, and semantic filtering.

Research limitations/implications

This paper focuses on the showcases of time series semantic language (TSSL), but also covers technical aspects and user interface issues. The authors are planning to develop TSSL further and evaluate it within further research projects and validation scenarios.

Practical implications

The research has a high practical impact on time series processing and provides new data sources for semantic web applications. It can also be used in social web platforms (especially for researchers) to provide a time series centric tagging and processing framework.

Originality/value

The paper presents an extended version of the paper presented at iiWAS2012.

Details

International Journal of Web Information Systems, vol. 9 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 4 March 2021

Bojan Matkovski, Stanislav Zekić, Žana Jurjević and Danilo Đokić

The purpose of this paper is to determine if the agribusiness sector can be an initiator of export on the emerging markets. For this aim, we analyzed export opportunities for the…

Abstract

Purpose

The purpose of this paper is to determine if the agribusiness sector can be an initiator of export on the emerging markets. For this aim, we analyzed export opportunities for the region of Vojvodina, the region in Serbia with the most potential for agribusiness.

Design/methodology/approach

This paper uses the Comparative Advantage Index and the Index of Intra-industrial Integration to determine the region's level of comparative advantage and the market's level of integration on the main emerging markets.

Findings

The results show that this region has the most competitive advantages in crop production – primarily in cereals and industrial plants – but the situation is not favorable for livestock production. Because of this, comparative advantage should be used as a factor for the growth of competitiveness in the sectors for which crop products are the raw material base. At the same time, agricultural policy measures should encourage more intensive agricultural production, which could create a better foundation for progress in the food industry.

Research limitations/implications

Data collected on foreign trade at the level of statistical regions is not always reliable. Also, regional and local characteristics are specific to each country, so the ability to generalize conclusions is limited.

Practical implications

This paper provides a useful review of the agri-food sector's competitiveness and determines which agri-food segments have competitive advantages. It is essential for policymakers to identify what determinants improve or degrade the competitiveness of the region's agri-food sector.

Originality/value

Since there are a limited number of studies analyzing trends of competitiveness for the region's agri-food sector, the paper will contribute to filling this gap. Furthermore, the framework is conceptually innovative in identifying the determinants that create export opportunities for the region on the international market.

Details

International Journal of Emerging Markets, vol. 17 no. 10
Type: Research Article
ISSN: 1746-8809

Keywords

Access

Year

Content type

Article (3)
1 – 3 of 3