Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal
Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…
Abstract
Purpose
Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.
Design/methodology/approach
In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.
Findings
The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.
Originality/value
To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.
Details
Keywords
Rosalina Rebucas Estacio and Rodolfo Callanta Raga Jr
The purpose of this paper is to describe a proposal for a data-driven investigation aimed at determining whether students’ learning behavior can be extracted and visualized from…
Abstract
Purpose
The purpose of this paper is to describe a proposal for a data-driven investigation aimed at determining whether students’ learning behavior can be extracted and visualized from action logs recorded by Moodle. The paper also tried to show whether there is a correlation between the activity level of students in online environments and their academic performance with respect to final grade.
Design/methodology/approach
The analysis was carried out using log data obtained from various courses dispensed in a university using a Moodle platform. The study also collected demographic profiles of students and compared them with their activity level in order to analyze how these attributes affect students’ level of activity in the online environment.
Findings
This work has shown that data mining algorithm like vector space model can be used to aggregate the action logs of students and quantify it into a single numeric value that can be used to generate visualizations of students’ level of activity. The current investigation indicates that there is a lot of variability in terms of the correlation between these two variables.
Practical implications
The value presented in the study can help instructors monitor course progression and enable them to rapidly identify which students are not performing well and adjust their pedagogical strategies accordingly.
Originality/value
A plan to continue the work by developing a complete dashboard style interface that instructors can use is already underway. More data need to be collected and more advanced processing tools are necessary in order to obtain a better perspective on this issue.
Details
Keywords
Xuwei Pan, Xuemei Zeng and Ling Ding
With the continuous increase of users, resources and tags, social tagging systems gradually present the characteristics of “big data” such as large number, fast growth, complexity…
Abstract
Purpose
With the continuous increase of users, resources and tags, social tagging systems gradually present the characteristics of “big data” such as large number, fast growth, complexity and unreliable quality, which greatly increases the complexity of recommendation. The contradiction between the efficiency and effectiveness of recommendation service in social tagging is increasingly becoming prominent. The purpose of this study is to incorporate topic optimization into collaborative filtering to enhance both the effectiveness and the efficiency of personalized recommendations for social tagging.
Design/methodology/approach
Combining the idea of optimization before service, this paper presents an approach that incorporates topic optimization into collaborative recommendations for social tagging. In the proposed approach, the recommendation process is divided into two phases of offline topic optimization and online recommendation service to achieve high-quality and efficient personalized recommendation services. In the offline phase, the tags' topic model is constructed and then used to optimize the latent preference of users and the latent affiliation of resources on topics.
Findings
Experimental evaluation shows that the proposed approach improves both precision and recall of recommendations, as well as enhances the efficiency of online recommendations compared with the three baseline approaches. The proposed topic optimization–incorporated collaborative recommendation approach can achieve the improvement of both effectiveness and efficiency for the recommendation in social tagging.
Originality/value
With the support of the proposed approach, personalized recommendation in social tagging with high quality and efficiency can be achieved.
Details
Keywords
Joo Hun Yoo, Hyejun Jeong, Jaehyeok Lee and Tai-Myoung Chung
This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be…
Abstract
Purpose
This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be applied to the medical field are presented. About 80 reference studies described in the field were reviewed, and the federated learning framework currently being developed by the research team is provided. This paper will help researchers to build an actual medical federated learning environment.
Design/methodology/approach
Since machine learning techniques emerged, more efficient analysis was possible with a large amount of data. However, data regulations have been tightened worldwide, and the usage of centralized machine learning methods has become almost infeasible. Federated learning techniques have been introduced as a solution. Even with its powerful structural advantages, there still exist unsolved challenges in federated learning in a real medical data environment. This paper aims to summarize those by category and presents possible solutions.
Findings
This paper provides four critical categorized issues to be aware of when applying the federated learning technique to the actual medical data environment, then provides general guidelines for building a federated learning environment as a solution.
Originality/value
Existing studies have dealt with issues such as heterogeneity problems in the federated learning environment itself, but those were lacking on how these issues incur problems in actual working tasks. Therefore, this paper helps researchers understand the federated learning issues through examples of actual medical machine learning environments.
Details
Keywords
Tingting Huang, Yilin Pan, Kai Zhu and Xinyuan Chen
This paper aims to study the impact of human resource heterogeneity on firms’ cash-holding policies.
Abstract
Purpose
This paper aims to study the impact of human resource heterogeneity on firms’ cash-holding policies.
Design/methodology/approach
The authors construct a proxy for human resource heterogeneity using the dissimilarity in employees’ skill structure between the firm and its peers in the same industry.
Findings
The authors report evidence that firms with heterogeneous human resources hold more cash than other firms. This effect is more pronounced in labor-intensive firms and firms more susceptible to hold-up by employees, i.e. firms located in regions with more labor disputes and firms surrounded by more external employment opportunities. In addition, the authors demonstrate that high cash holdings triggered by human resource heterogeneity reduce the scale and efficiency of firms’ capital investment.
Originality/value
This study highlights the role of human resource heterogeneity in determining firms’ cash policies. This paper adds to the understanding of labor adjustment costs within the firm and provides insights into firms’ cash-holding decisions.
Details
Keywords
Prabhat Pokharel, Roshan Pokhrel and Basanta Joshi
Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities…
Abstract
Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities. The variable entities are extracted by comparing the logs messages against the log patterns. Each of these log patterns can be represented in the form of a log signature. In this paper, we present a hybrid approach for log signature extraction. The approach consists of two modules. The first module identifies log patterns by generating log clusters. The second module uses Named Entity Recognition (NER) to extract signatures by using the extracted log clusters. Experiments were performed on event logs from Windows Operating System, Exchange and Unix and validation of the result was done by comparing the signatures and the variable entities against the standard log documentation. The outcome of the experiments was that extracted signatures were ready to be used with a high degree of accuracy.
Details
Keywords
Many recommender systems are generally unable to provide accurate recommendations to users with limited interaction history, which is known as the cold-start problem. This issue…
Abstract
Purpose
Many recommender systems are generally unable to provide accurate recommendations to users with limited interaction history, which is known as the cold-start problem. This issue can be resolved by trivial approaches that select random items or the most popular one to recommend to the new users. However, these methods perform poorly in many cases. This paper aims to explore the problem that how to make accurate recommendations for the new users in cold-start scenarios.
Design/methodology/approach
In this paper, the authors propose embedded-bandit method, inspired by Word2Vec technique and contextual bandit algorithm. The authors describe user contextual information with item embedding features constructed by Word2Vec. In addition, based on the intelligence measurement model in Crowd Science, the authors propose a new evaluation method to measure the utility of recommendations.
Findings
The authors introduce Word2Vec technique for constructing user contextual features, which improved the accuracy of recommendations compared to traditional multi-armed bandit problem. Apart from this, using this study’s intelligence measurement model, the utility also outperforms.
Practical implications
Improving the accuracy of recommendations during the cold-start phase can greatly raise user stickiness and increase user favorability, which in turn contributes to the commercialization of the app.
Originality/value
The algorithm proposed in this paper reflects that user contextual features can be represented by clicked items embedding vector.
Details
Keywords
Xiaodong Lu, Jingjun Liu and Janus Jian Zhang
This study aims to take advantage of exporters’ product codes and examine the effects of government subsidization on corporate product strategies by focusing on the dimension of…
Abstract
Purpose
This study aims to take advantage of exporters’ product codes and examine the effects of government subsidization on corporate product strategies by focusing on the dimension of product differentiation.
Design/methodology/approach
This study uses harmonized system (HS) product codes to construct a novel measure of product differentiation among a sample of Chinese exporters during 2000–2012. It uses propensity score matching to construct a comparable sample of control firms for exporters receiving government subsidies, and then a difference-in-differences (DID) analysis is conducted.
Findings
This study finds that product differentiation decreases immediately upon receiving a government subsidy. This finding suggests that in an emerging market, firms use their subsidy to imitate competitors rather than increase innovation. Further analyses show that this effect is concentrated among wholly foreign-owned enterprises and firms that focus on general trade rather than processing trade. In addition, the authors find some evidence that government subsidization leads to an increase in the number of product lines and decreases in domestic value added and export product quality.
Originality/value
This study constructs a novel measure of product differentiation for a large sample of Chinese exporters and provides insights that government subsidization can affect corporate product strategies.
Details
Keywords
Elena Barbierato, Iacopo Bernetti and Irene Capecchi
Wine packaged tours as a specific aspect of wine tourism have so far been neglected in research, for this reason, the purpose of this study is to study the key elements for the…
Abstract
Purpose
Wine packaged tours as a specific aspect of wine tourism have so far been neglected in research, for this reason, the purpose of this study is to study the key elements for the success of the wine tour in Tuscany (Italy), evaluating the points of strength and weakness.
Design/methodology/approach
The study combines approaches of text mining, sentiment analysis and natural language processing, drawing on data from the TripAdvisor platform, obtaining through an automatic procedure 9,616 reviews from 600 tours in the years 2010–2020.
Findings
The authors identified six elements of successful wine tours expressed by research subjects: tour guide; logistical aspects; the quality of the wine; the quality of the food; complementary tourist and recreational activities; the landscape and historic villages. The key strength associated with success was the integration of the leading wine product with food, landscape and historic villages, while the main criticisms were concerned with the organization and planning of the tour. Furthermore, the tour guide also plays a fundamental role in consumer satisfaction.
Research limitations/implications
The limitations of the method were linked to the origin of the data used. The main one is that TripAdvisor does not allow you to have social and personal information about the tourist who wrote the review; therefore, the methods are substantially complementary to the traditional survey through questionnaires.
Practical implications
The proposed model can be used both by professionals to improve the quality of their products and by policymakers to promote the territorial development of quality wine-growing areas.
Social implications
The proposed model can be useful for policymakers to promote the territorial development of quality wine-growing areas.
Originality/value
The methodology we tested is easily transferable to many countries and to the authors’ knowledge, for the first time attempts to combine multidimensional scaling, sentiment analysis and natural language processing approaches.
Details
Keywords
Santo Raneri, Fabian Lecron, Julie Hermans and François Fouss
Artificial intelligence (AI) has started to receive attention in the field of digital entrepreneurship. However, few studies propose AI-based models aimed at assisting…
Abstract
Purpose
Artificial intelligence (AI) has started to receive attention in the field of digital entrepreneurship. However, few studies propose AI-based models aimed at assisting entrepreneurs in their day-to-day operations. In addition, extant models from the product design literature, while technically promising, fail to propose methods suitable for opportunity development with high level of uncertainty. This study develops and tests a predictive model that provides entrepreneurs with a digital infrastructure for automated testing. Such an approach aims at harnessing AI-based predictive technologies while keeping the ability to respond to the unexpected.
Design/methodology/approach
Based on effectuation theory, this study identifies an AI-based, predictive phase in the “build-measure-learn” loop of Lean startup. The predictive component, based on recommendation algorithm techniques, is integrated into a framework that considers both prediction (causal) and controlled (effectual) logics of action. The performance of the so-called active learning build-measure-predict-learn algorithm is evaluated on a data set collected from a case study.
Findings
The results show that the algorithm can predict the desirability level of newly implemented product design decisions (PDDs) in the context of a digital product. The main advantages, in addition to the prediction performance, are the ability to detect cases where predictions are likely to be less precise and an easy-to-assess indicator for product design desirability. The model is found to deal with uncertainty in a threefold way: epistemological expansion through accelerated data gathering, ontological reduction of uncertainty by revealing prior “unknown unknowns” and methodological scaffolding, as the framework accommodates both predictive (causal) and controlled (effectual) practices.
Originality/value
Research about using AI in entrepreneurship is still in a nascent stage. This paper can serve as a starting point for new research on predictive techniques and AI-based infrastructures aiming to support digital entrepreneurs in their day-to-day operations. This work can also encourage theoretical developments, building on effectuation and causation, to better understand Lean startup practices, especially when supported by digital infrastructures accelerating the entrepreneurial process.