Jahna Otterbacher and Dragomir Radev
Automated sentence‐level relevance and novelty detection would be of direct benefit to many information retrieval systems. However, the low level of agreement between human judges…
Abstract
Purpose
Automated sentence‐level relevance and novelty detection would be of direct benefit to many information retrieval systems. However, the low level of agreement between human judges performing the task is an issue of concern. In previous approaches, annotators were asked to identify sentences in a document set that are relevant to a given topic, and then to eliminate sentences that do not provide novel information. This paper aims to explore a new approach in which relevance and novelty judgments are made within the context of specific, factual information needs, rather than with respect to a broad topic.
Design/methodology/approach
An experiment is conducted in which annotators perform the novelty detection task in both the topic‐focused and fact‐focused settings.
Findings
Higher levels of agreement between judges are found on the task of identifying relevant sentences in the fact‐focused approach. However, the new approach does not improve agreement on novelty judgments.
Originality/value
The analysis confirms the intuition that making sentence‐level relevance judgments is likely to be the more difficult of the two tasks in the novelty detection framework.
Details
Keywords
Silvio Peroni, Alexander Dutton, Tanya Gray and David Shotton
Citation data needs to be recognised as a part of the Commons – those works that are freely and legally available for sharing – and placed in an open repository. The paper aims to…
Abstract
Purpose
Citation data needs to be recognised as a part of the Commons – those works that are freely and legally available for sharing – and placed in an open repository. The paper aims to discuss this issue.
Design/methodology/approach
The Open Citation Corpus is a new open repository of scholarly citation data, made available under a Creative Commons CC0 1.0 public domain dedication and encoded as Open Linked Data using the SPAR Ontologies.
Findings
The Open Citation Corpus presently provides open access (OA) to reference lists from 204,637 articles from the OA Subset of PubMed Central, containing 6,325,178 individual references to 3,373,961 unique papers.
Originality/value
Scholars, publishers and institutions may freely build upon, enhance and reuse the open citation data for any purpose, without restriction under copyright or database law.
Details
Keywords
BULGARIA: New defence minister will bolster premier