Dumitru Roman, Neal Reeves, Esteban Gonzalez, Irene Celino, Shady Abd El Kader, Philip Turk, Ahmet Soylu, Oscar Corcho, Raquel Cedazo, Gloria Re Calegari, Damiano Scandolari and Elena Simperl
Citizen Science – public participation in scientific projects – is becoming a global practice engaging volunteer participants, often non-scientists, with scientific research…
Abstract
Purpose
Citizen Science – public participation in scientific projects – is becoming a global practice engaging volunteer participants, often non-scientists, with scientific research. Citizen Science is facing major challenges, such as quality and consistency, to reap open the full potential of its outputs and outcomes, including data, software and results. In this context, the principles put forth by Data Science and Open Science domains are essential for alleviating these challenges, which have been addressed at length in these domains. The purpose of this study is to explore the extent to which Citizen Science initiatives capitalise on Data Science and Open Science principles.
Design/methodology/approach
The authors analysed 48 Citizen Science projects related to pollution and its effects. They compared each project against a set of Data Science and Open Science indicators, exploring how each project defines, collects, analyses and exploits data to present results and contribute to knowledge.
Findings
The results indicate several shortcomings with respect to commonly accepted Data Science principles, including lack of a clear definition of research problems and limited description of data management and analysis processes, and Open Science principles, including lack of the necessary contextual information for reusing project outcomes.
Originality/value
In the light of this analysis, the authors provide a set of guidelines and recommendations for better adoption of Data Science and Open Science principles in Citizen Science projects, and introduce a software tool to support this adoption, with a focus on preparation of data management plans in Citizen Science projects.
Details
Keywords
Qiong Bu, Elena Simperl, Adriane Chapman and Eddy Maddalena
Ensuring quality is one of the most significant challenges in microtask crowdsourcing tasks. Aggregation of the collected data from the crowd is one of the important steps to…
Abstract
Purpose
Ensuring quality is one of the most significant challenges in microtask crowdsourcing tasks. Aggregation of the collected data from the crowd is one of the important steps to infer the correct answer, but the existing study seems to be limited to the single-step task. This study aims to look at multiple-step classification tasks and understand aggregation in such cases; hence, it is useful for assessing the classification quality.
Design/methodology/approach
The authors present a model to capture the information of the workflow, questions and answers for both single- and multiple-question classification tasks. They propose an adapted approach on top of the classic approach so that the model can handle tasks with several multiple-choice questions in general instead of a specific domain or any specific hierarchical classifications. They evaluate their approach with three representative tasks from existing citizen science projects in which they have the gold standard created by experts.
Findings
The results show that the approach can provide significant improvements to the overall classification accuracy. The authors’ analysis also demonstrates that all algorithms can achieve higher accuracy for the volunteer- versus paid-generated data sets for the same task. Furthermore, the authors observed interesting patterns in the relationship between the performance of different algorithms and workflow-specific factors including the number of steps and the number of available options in each step.
Originality/value
Due to the nature of crowdsourcing, aggregating the collected data is an important process to understand the quality of crowdsourcing results. Different inference algorithms have been studied for simple microtasks consisting of single questions with two or more answers. However, as classification tasks typically contain many questions, the proposed method can be applied to a wide range of tasks including both single- and multiple-question classification tasks.
Details
Keywords
Marçal Mora-Cantallops, Salvador Sánchez-Alonso and Elena García-Barriocanal
The purpose of this paper is to review the current status of research on Wikidata and, in particular, of articles that either describe applications of Wikidata or provide…
Abstract
Purpose
The purpose of this paper is to review the current status of research on Wikidata and, in particular, of articles that either describe applications of Wikidata or provide empirical evidence, in order to uncover the topics of interest, the fields that are benefiting from its applications and which researchers and institutions are leading the work.
Design/methodology/approach
A systematic literature review is conducted to identify and review how Wikidata is being dealt with in academic research articles and the applications that are proposed. A rigorous and systematic process is implemented, aiming not only to summarize existing studies and research on the topic, but also to include an element of analytical criticism and a perspective on gaps and future research.
Findings
Despite Wikidata’s potential and the notable rise in research activity, the field is still in the early stages of study. Most research is published in conferences, highlighting such immaturity, and provides little empirical evidence of real use cases. Only a few disciplines currently benefit from Wikidata’s applications and do so with a significant gap between research and practice. Studies are dominated by European researchers, mirroring Wikidata’s content distribution and limiting its Worldwide applications.
Originality/value
The results collect and summarize existing Wikidata research articles published in the major international journals and conferences, delivering a meticulous summary of all the available empirical research on the topic which is representative of the state of the art at this time, complemented by a discussion of identified gaps and future work.
Details
Keywords
Taro Aso, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to propose a scheme that allows users to interactively explore relations between entities in knowledge bases (KBs). KBs store a wide range of…
Abstract
Purpose
The purpose of this paper is to propose a scheme that allows users to interactively explore relations between entities in knowledge bases (KBs). KBs store a wide range of knowledge about real-world entities in a structured form as (subject, predicate, object). Although it is possible to query entities and relations among entities by specifying appropriate query expressions of SPARQL or keyword queries, the structure and the vocabulary are complicated, and it is hard for non-expert users to get the desired information. For this reason, many researchers have proposed faceted search interfaces for KBs. Nevertheless, existing ones are designed for finding entities and are insufficient for finding relations.
Design/methodology/approach
To this problem, the authors propose a novel “relation facet” to find relations between entities. To generate it, they applied clustering on predicates for grouping those predicates that are connected to common objects. Having generated clusters of predicates, the authors generated a facet according to the result. Specifically, they proposed to use a couple of clustering algorithms, namely, agglomerative hierarchical clustering (AHC) and CANDECOMP/PARAFAC (CP) tensor decomposition which is one of the tensor decomposition methods.
Findings
The authors experimentally show test the performance of clustering methods and found that AHC performs better than tensor decomposition. Besides, the authors conducted a user study and show that their proposed scheme performs better than existing ones in the task of searching relations.
Originality/value
The authors propose a relation-oriented faceted search method for KBs that allows users to explore relations between entities. As far as the authors know, this is the first method to focus on the exploration of relations between entities.