Search results
1 – 10 of 26James Faulkner, Liuxing Lu and Jiangping Chen
Archivists are charged with the preservation of their collections by reducing deterioration because of temperature, relative humidity, atmospheric pollutants and other factors…
Abstract
Purpose
Archivists are charged with the preservation of their collections by reducing deterioration because of temperature, relative humidity, atmospheric pollutants and other factors. The methods archivists use to preserve their collections may have a negative impact on the environment. This paper aims to identify factors for building environmentally sustainable archives to help guide archival environmental sustainability practices.
Design/methodology/approach
This paper identifies factors through a literature review, and conducts a content analysis of the websites of seven national/state archives. The analysis focuses on the policy statements of these archives.
Findings
The authors found that the literature lists 31 factors under 7 categories: electricity, facilities, water, exhibitions, pollution, collection practices and education and outreach. The content analysis of the policy documents or statements demonstrated that archives applied and addressed mostly “resource-related” efforts to protect the environment, such as factors related to electricity, facilities, water and pollution. However, factors related to “work-related” efforts, such as exhibitions, collection practices and education and outreach, were ignored.
Practical implications
This study can provide insights to archivists on current implementation and help to guide their further environmental sustainability practices.
Originality/value
Little is known regarding archivists’ implementation of environmentally sustainable practices. This study focuses on identifying factors for environmental sustainability of archives addressed by literature and existing archives, trying to find the gap between literature and practice.
Details
Keywords
Jiangping Chen, Marie Bloechle, Beth Thomsett-Scott and Eileen Breen
Huyen Nguyen, Haihua Chen, Jiangping Chen, Kate Kargozari and Junhua Ding
This study aims to evaluate a method of building a biomedical knowledge graph (KG).
Abstract
Purpose
This study aims to evaluate a method of building a biomedical knowledge graph (KG).
Design/methodology/approach
This research first constructs a COVID-19 KG on the COVID-19 Open Research Data Set, covering information over six categories (i.e. disease, drug, gene, species, therapy and symptom). The construction used open-source tools to extract entities, relations and triples. Then, the COVID-19 KG is evaluated on three data-quality dimensions: correctness, relatedness and comprehensiveness, using a semiautomatic approach. Finally, this study assesses the application of the KG by building a question answering (Q&A) system. Five queries regarding COVID-19 genomes, symptoms, transmissions and therapeutics were submitted to the system and the results were analyzed.
Findings
With current extraction tools, the quality of the KG is moderate and difficult to improve, unless more efforts are made to improve the tools for entity extraction, relation extraction and others. This study finds that comprehensiveness and relatedness positively correlate with the data size. Furthermore, the results indicate the performances of the Q&A systems built on the larger-scale KGs are better than the smaller ones for most queries, proving the importance of relatedness and comprehensiveness to ensure the usefulness of the KG.
Originality/value
The KG construction process, data-quality-based and application-based evaluations discussed in this paper provide valuable references for KG researchers and practitioners to build high-quality domain-specific knowledge discovery systems.
Details
Keywords
Haihua Chen, Jeonghyun (Annie) Kim, Jiangping Chen and Aisa Sakata
This study aims to explore the applications of natural language processing (NLP) and data analytics in understanding large-scale digital collections in oral history archives.
Abstract
Purpose
This study aims to explore the applications of natural language processing (NLP) and data analytics in understanding large-scale digital collections in oral history archives.
Design/methodology/approach
NLP and data analytics were used to analyse the oral interview transcripts of 904 survivors of the Japanese American incarceration camps collected from Densho Digital Repository, relying specifically on descriptive analysis, keyword extraction, topic modelling and sentiment analysis (SA).
Findings
The researchers found multiple geographic areas of large residential communities of ethnic Japanese people and the place names of the concentration camps. The keywords and topics extracted reflect the deplorable conditions and militaristic nature of the camps and the forced labour of the internees. When remembering history, the main focus for the narrators remains the redress and reparation movement to obtain the restitution of their civil rights. SA further found that the forcible removal and incarceration of Japanese Americans during Second World War negatively impacted and brought deep trauma to the narrators.
Originality/value
This case study demonstrated how NLP and data analytics could be applied to analyse oral history archives and open avenues for discovery. Archival researchers and the general public may benefit from this type of analysis in making connections between temporal, spatial and emotional elements, which will contribute to a holistic understanding of individuals and communities in terms of their collective memory.
Details
Keywords
Irhamni Ali, Lingzi Hong and Jiangping Chen
During the COVID-19 pandemic, in order to prevent the spread of disease, the National Library of Indonesia Cataloging Department adopted remote working. There is a need to examine…
Abstract
Purpose
During the COVID-19 pandemic, in order to prevent the spread of disease, the National Library of Indonesia Cataloging Department adopted remote working. There is a need to examine the productivity of remote cataloging as this form of cataloging becomes more prevalent.
Design/methodology/approach
The study was conducted using a mixed methods approach. The authors analyzed data to assess cataloging librarians' productivity based on system logs. Then, the authors interviewed librarians to understand librarians' perspectives concerning productivity and remote cataloging, and also to seek insights into factors that may affect productivity while working remotely.
Findings
The analysis found higher productivity in terms of quantity of cataloging. Librarians' productivity during remote cataloging is not statistically related to individual factors of age, years of experience, or gender. The in-depth interviews found that other factors may hinder the quality and quantity of the remote cataloging, including the working environment, infrastructure, and lack of policies on remote working.
Research limitations/implications
The findings were based on a study conducted in the National Library of Indonesia, which may not apply to libraries with different infrastructures or existing policies in remote cataloging. However, the authors identified numerous factors that could be related to remote cataloging productivity. More work needs to be done to identify these factors that impact productivity by conducting further surveys.
Practical implications
The research provides evidence showing the productivity of cataloging can be higher in remote working mode. The study provides insights for library managers to decide whether to implement remote cataloging and what additional perspectives could be considered for the better implementation of remote cataloging.
Originality/value
The gap in the literature about remote cataloging and productivity has been bridged.
Details
Keywords
This paper aims to introduce the construction methods, image organization, collection use and access of benchmark image collections to the digital library (DL) community. It aims…
Abstract
Purpose
This paper aims to introduce the construction methods, image organization, collection use and access of benchmark image collections to the digital library (DL) community. It aims to connect two distinct communities: the DL community and image processing researchers so that future image collections could be better constructed, organized and managed for both human and computer use.
Design/methodology/approach
Image collections are first identified through an extensive literature review of published journal articles and a web search. Then, a coding scheme focusing on image collections’ creation, organization, access and use is developed. Next, three major benchmark image collections are analysed based on the proposed coding scheme. Finally, the characteristics of benchmark image collections are summarized and compared to DLs.
Findings
Although most of the image collections in DLs are carefully curated and organized using various metadata schema based on an image’s external features to facilitate human use, the benchmark image collections created for promoting image processing algorithms are annotated on an image’s content to the pixel level, which makes each image collection a more fine-grained, organized database appropriate for developing automatic techniques on classification summarization, visualization and content-based retrieval.
Research limitations/implications
This paper overviews image collections by their application fields. The three most representative natural image collections in general areas are analysed in detail based on a homemade coding scheme, which could be further extended. Also, domain-specific image collections, such as medical image collections or collections for scientific purposes, are not covered.
Practical implications
This paper helps DLs with image collections to understand how benchmark image collections used by current image processing research are created, organized and managed. It informs multiple parties pertinent to image collections to collaborate on building, sustaining, enriching and providing access to image collections.
Originality/value
This paper is the first attempt to review and summarize benchmark image collections for DL managers and developers. The collection creation process and image organization used in these benchmark image collections open a new perspective to digital librarians for their future DL collection development.
Details
Keywords
Lingzi Hong, William Moen, Xinchen Yu and Jiangping Chen
This paper aims to selects 59 journals that focus on data science research in 14 disciplines from the Ulrichsweb online repository. This paper analyzes the aim and scope statement…
Abstract
Purpose
This paper aims to selects 59 journals that focus on data science research in 14 disciplines from the Ulrichsweb online repository. This paper analyzes the aim and scope statement using both quantitative and qualitative methods to identify the research types and the scope of research promoted by these journals.
Design/methodology/approach
Multiple disciplines are involved in data science research and publishing, but there lacks an overview of what those disciplines are and how they relate to data science. In this study, this paper aims to understand the disciplinary characteristics of data science research. Two research questions are answered: What is the population of journals that focus on data science? What disciplinary landscape of data science is revealed in the aim and scope statements of these journals?
Findings
Theoretical research is mainly included in journals that belong to statistics, engineering and sciences. Almost all data science journals include applied research papers. Keywords analysis shows that data science research in computers, statistics, engineering and sciences appear to share characteristics. While in other disciplines such as biology, business and education, the keywords are indicative of the types of data to be used and the special problems in these disciplines.
Originality/value
This is the first study to use journals as the unit of analysis to identify the disciplines involved in data science research. The results provide an overview of how researchers and educators from different disciplinary backgrounds understand data science research.
Details
Keywords
Mingwei Tang, Jiangping Chen, Haihua Chen, Zhenyuan Xu, Yueyao Wang, Mengting Xie and Jiangwei Lin
The purpose of this paper is to provide an integrated semantic information retrieval (IR) solution based on an ontology-improved vector space model for situations where a digital…
Abstract
Purpose
The purpose of this paper is to provide an integrated semantic information retrieval (IR) solution based on an ontology-improved vector space model for situations where a digital collection is established or curated. It aims to create a retrieval approach which could return the results by meanings rather than by keywords.
Design/methodology/approach
In this paper, the authors propose a semantic term frequency algorithm to create a semantic vector space model (SeVSM) based on ontology. To support the calculation, a multi-branches tree model is created to represent the ontology and a set of algorithms is developed to operate it. Then, a semantic ontology-based IR system based on the SeVSM model is designed and developed to verify the effectiveness of the proposed model.
Findings
The experimental study using 30 queries from 15 different domains confirms the effectiveness of the SeVSM and the usability of the proposed system. The results demonstrate that the proposed model and system can be a significant exploration to enhance IR in specific domains, such as a digital library and e-commerce.
Originality/value
This research not only creates a semantic retrieval model, but also provides the application approach via designing and developing a semantic retrieval system based on the model. Comparing with most of the current related research, the proposed research studies the whole process of realizing a semantic retrieval.
Details
Keywords
Haihua Chen, Yunhan Yang, Wei Lu and Jiangping Chen
Citation contexts have been found useful in many scenarios. However, existing context-based recommendations ignored the importance of diversity in reducing the redundant issues…
Abstract
Purpose
Citation contexts have been found useful in many scenarios. However, existing context-based recommendations ignored the importance of diversity in reducing the redundant issues and thus cannot cover the broad range of user interests. To address this gap, the paper aims to propose a novelty task that can recommend a set of diverse citation contexts extracted from a list of citing articles. This will assist users in understanding how other scholars have cited an article and deciding which articles they should cite in their own writing.
Design/methodology/approach
This research combines three semantic distance algorithms and three diversification re-ranking algorithms for the diversifying recommendation based on the CiteSeerX data set and then evaluates the generated citation context lists by applying a user case study on 30 articles.
Findings
Results show that a diversification strategy that combined “word2vec” and “Integer Linear Programming” leads to better reading experience for participants than other diversification strategies, such as CiteSeerX using a list sorted by citation counts.
Practical implications
This diversifying recommendation task is valuable for developing better systems in information retrieval, automatic academic recommendations and summarization.
Originality/value
The originality of the research lies in the proposal of a novelty task that can recommend a diversification context list describing how other scholars cited an article, thereby making citing decisions easier. A novel mixed approach is explored to generate the most efficient diversifying strategy. Besides, rather than traditional information retrieval evaluation, a user evaluation framework is introduced to reflect user information needs more objectively.
Details
Keywords
Chunxiu Qin, Yaxi Liu, Jian Mou and Jiangping Chen
Online knowledge communities make great contributions to global knowledge sharing and innovation. Resource tagging approaches have been widely adopted in such communities to…
Abstract
Purpose
Online knowledge communities make great contributions to global knowledge sharing and innovation. Resource tagging approaches have been widely adopted in such communities to describe, annotate and organize knowledge resources mainly through users’ participation. However, it is unclear what causes the adoption of a particular resource tagging approach. The purpose of this paper is to identify factors that drive users to use a hybrid social tagging approach.
Design/methodology/approach
Technology acceptance model and social cognitive theory are adopted to support an integrated model proposed in this paper. Zhihu, one of the most popular online knowledge communities in China, is taken as the survey context. A survey was conducted with a questionnaire and collected data were analyzed through structural equation model.
Findings
A new hybrid social resource tagging approach was refined and described. The empirical results revealed that self-efficacy, perceived usefulness (PU) and perceived ease of use exert positive effect on users’ attitude. Moreover, social influence, PU and attitude impact significantly on users’ intention to use a hybrid social resource tagging approach.
Originality/value
Theoretically, this study enriches the type of resource tagging approaches and recognizes factors influencing user adoption to use it. Regarding the practical parts, the results provide online information system providers and designers with referential strategies to improve the performance of the current tagging approaches and promote them.
Details