This paper aims to propose an entity-based scientific metadata schema, i.e. Scientific Knowledge Object (SKO) Types. During the past 50 years, many metadata schemas have been…
Abstract
Purpose
This paper aims to propose an entity-based scientific metadata schema, i.e. Scientific Knowledge Object (SKO) Types. During the past 50 years, many metadata schemas have been developed in a variety of disciplines. However, current scientific metadata schemas focus on describing data, but not entities. They are descriptive, but few of them are structural and administrative.
Design/methodology/approach
To describe entities in scientific knowledge, the theory of SKO Types is proposed. SKO Types is an entity-based theory for representing and linking SKOs. It defines entities, relationships between entities and attributes of each entity in the scientific domain.
Findings
In scientific knowledge management, SKO Types serves as the basis for relating entities, entity components, aggregated entities, relationships and attributes to various tasks, e.g. linked entity, rhetorical structuring, strategic reading, semantic annotating, etc., that users may perform when consulting ubiquitous SKOs.
Originality/value
SKO Types can be widely applied in various digital libraries and scientific knowledge management systems, while for the existing legacy of scientific publications and their associated metadata schemas.
Details
Keywords
Alexander Ivanyukovich, Maurizio Marchese and Fausto Giunchiglia
The purpose of this paper is to provide support for automation of the annotation process of large corpora of digital content.
Abstract
Purpose
The purpose of this paper is to provide support for automation of the annotation process of large corpora of digital content.
Design/methodology/approach
The paper presents and discusses an information extraction pipeline from digital document acquisition to information extraction, processing and management. An overall architecture that supports such an extraction pipeline is detailed and discussed.
Findings
The proposed pipeline is implemented in a working prototype of an autonomous digital library (A‐DL) system called ScienceTreks that: supports a broad range of methods for document acquisition; does not rely on any external information sources and is solely based on the existing information in the document itself and in the overall set in a given digital archive; and provides application programming interfaces (API) to support easy integration of external systems and tools in the existing pipeline.
Practical implications
The proposed A‐DL system can be used in automating end‐to‐end information retrieval and processing, supporting the control and elimination of error‐prone human intervention in the process.
Originality/value
High quality automatic metadata extraction is a crucial step in the move from linguistic entities to logical entities, relation information and logical relations, and therefore to the semantic level of digital library usability. This in turn creates the opportunity for value‐added services within existing and future semantic‐enabled digital library systems.