Kianoosh Rashidi, Hajar Sotudeh, Mahdieh Mirzabeigi and Alireza Nikseresht
Social comments are rich in information and useful in evaluating, ranking or retrieving different kinds of materials. However, their merits in representing or providing added…
Abstract
Purpose
Social comments are rich in information and useful in evaluating, ranking or retrieving different kinds of materials. However, their merits in representing or providing added values to scientific articles have not yet been studied. Therefore, the present study investigates the informativeness of open review reports as a kind of social comments in a scholarly setting.
Design/methodology/approach
A test collection was built consisting of 100 randomly selected queries, 1,962 reviewed documents and their reviewers' open reports from F1000Research. They were analyzed using natural language techniques. The comments' salient words were compared to the documents' and also to the Medical Subject Headings (MeSH) salient words. The receiver operating characteristic (ROC) curve was used to test the accuracy of the comments in representing their related articles.
Findings
The papers' contents and comments have a considerable number of salient words in common. The comments' salient words are also largely found in the MeSH, signifying their consistency with the knowledge tree and their potential to add some complementary features to their related items. The ROC curves confirm the accuracy of the comments in retrieving their related papers.
Originality/value
This research is the first to reveal the merits of open review reports on scientific papers, in terms of their relatedness to their mother articles, in specific, and to the knowledge tree, in general. They are found informative in not only representing the reviewed papers but also in adding values to the contents of the papers.
Details
Keywords
Kianoosh Rashidi, Hajar Sotudeh and Alireza Nikseresht
This study aimed to investigate how the enrichment of medical documents' index terms by their comments improves the relevance and novelty of the top-ranked results retrieved by an…
Abstract
Purpose
This study aimed to investigate how the enrichment of medical documents' index terms by their comments improves the relevance and novelty of the top-ranked results retrieved by an NLP system.
Design/methodology/approach
A semi-experimental pre-test and post-test research was designed to compare NLP-based indexes before and after being expanded by the comment terms. The experiments were conducted on a test collection of 13,957 documents commented by F1000-Prime reviewers. They were indexed at title, abstract, body and full-text levels. In total, 100 seed documents were randomly selected and served as queries. The textual similarity of the documents and queries was calculated using Lucene-more-like-this function and evaluated by the semantic similarity of their MeSH. The results novelty was measured using maximal marginal relevance and evaluated by their MeSH novelties. Normalized discounted cumulative gain was used to compare the basic and expanded indexes' precisions at 10, 20 and 50 top ranks.
Findings
The relevance and novelty of the results ranked at the top precision points was improved after expanding the indexes by the comment terms. The finding implies that meta-texts are effective in representing their mother documents, by adding dynamic elements to their rather static contents. It also provides further evidence about the merits of the application of social intelligence and collective wisdom reflected in the actions and reactions of users in tackling the challenges faced by NLP-based systems.
Originality/value
This is the first study to confirm that social comments on scientific papers improve the performance of information systems in terms of relevance and novelty.
Peer review
The peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-05-2022-0283.
Details
Keywords
Maryam Yaghtin, Hajar Sotudeh, Alireza Nikseresht and Mahdieh Mirzabeigi
Co-citation frequency, defined as the number of documents co-citing two articles, is considered as a quantitative, and thus, an efficient proxy of subject relatedness or prestige…
Abstract
Purpose
Co-citation frequency, defined as the number of documents co-citing two articles, is considered as a quantitative, and thus, an efficient proxy of subject relatedness or prestige of the co-cited articles. Despite its quantitative nature, it is found effective in retrieving and evaluating documents, signifying its linkage with the related documents' contents. To better understand the dynamism of the citation network, the present study aims to investigate various content features giving rise to the measure.
Design/methodology/approach
The present study examined the interaction of different co-citation features in explaining the co-citation frequency. The features include the co-cited works' similarities in their full-texts, Medical Subject Headings (MeSH) terms, co-citation proximity, opinions and co-citances. A test collection is built using the CITREC dataset. The data were analyzed using natural language processing (NLP) and opinion mining techniques. A linear model was developed to regress the objective and subjective content-based co-citation measures against the natural log of the co-citation frequency.
Findings
The dimensions of co-citation similarity, either subjective or objective, play significant roles in predicting co-citation frequency. The model can predict about half of the co-citation variance. The interaction of co-opinionatedness and non-co-opinionatedness is the strongest factor in the model.
Originality/value
It is the first study in revealing that both the objective and subjective similarities could significantly predict the co-citation frequency. The findings re-confirm the citation analysis assumption claiming the connection between the cognitive layers of cited documents and citation measures in general and the co-citation frequency in particular.
Peer review
The peer review history for this article is available at https://publons.com/publon/10.1108/OIR-04-2020-0126.