Search results
1 – 10 of 13Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman
In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…
Abstract
Purpose
In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.
Design/methodology/approach
On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.
Findings
The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.
Originality/value
The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.
Details
Keywords
Koraljka Golub, Pawel Michal Ziolkowski and Goran Zlodi
The study aims to paint a representative picture of the current state of search interfaces of Swedish online museum collections, focussing on search functionalities with…
Abstract
Purpose
The study aims to paint a representative picture of the current state of search interfaces of Swedish online museum collections, focussing on search functionalities with particular reference to subject searching, as well as the use of controlled vocabularies, with the purpose of identifying which improvements of the search interfaces are needed to ensure high-quality information retrieval for the end user.
Design/methodology/approach
In the first step, a set of 21 search interface criteria was identified, based on related research and current standards in the domain of cultural heritage knowledge organization. Secondly, a complete set of Swedish museums that provide online access to their collections was identified, comprising nine cross-search services and 91 individual museums' websites. These 100 websites were each evaluated against the 21 criteria, between 1 July and 31 August 2020.
Findings
Although many standards and guidelines are in place to ensure quality-controlled subject indexing, which in turn support information retrieval of relevant resources (as individual or full search results), the study shows that they are not broadly implemented, resulting in information retrieval failures for the end user. The study also demonstrates a strong need for the implementation of controlled vocabularies in these museums.
Originality/value
This study is a rare piece of research which examines subject searching in online museums; the 21 search criteria and their use in the analysis of the complete set of online collections of a country represents a considerable and unique contribution to the fields of knowledge organization and information retrieval of cultural heritage. Its particular value lies in showing how the needs of end users, many of which are documented and reflected in international standards and guidelines, should be taken into account in designing search tools for these museums; especially so in subject searching, which is the most complex and yet the most common type of search. Much effort has been invested into digitizing cultural heritage collections, but access to them is hindered by poor search functionality. This study identifies which are the most important aspects to improve.
Details
Keywords
Koraljka Golub, Xu Tan, Ying-Hsang Liu and Jukka Tyrkkö
This exploratory study aims to help contribute to the understanding of online information search behaviour of PhD students from different humanities fields, with a focus on…
Abstract
Purpose
This exploratory study aims to help contribute to the understanding of online information search behaviour of PhD students from different humanities fields, with a focus on subject searching.
Design/methodology/approach
The methodology is based on a semi-structured interview within which the participants are asked to conduct both a controlled search task and a free search task. The sample comprises eight PhD students in several humanities disciplines at Linnaeus University, a medium-sized Swedish university from 2020.
Findings
Most humanities PhD students in the study have received training in information searching, but it has been too basic. Most rely on web search engines like Google and Google Scholar for publications' search, and university's discovery system for known-item searching. As these systems do not rely on controlled vocabularies, the participants often struggle with too many retrieved documents that are not relevant. Most only rarely or never use disciplinary bibliographic databases. The controlled search task has shown some benefits of using controlled vocabularies in the disciplinary databases, but incomplete synonym or concept coverage as well as user unfriendly search interface present hindrances.
Originality/value
The paper illuminates an often-forgotten but pervasive challenge of subject searching, especially for humanities researchers. It demonstrates difficulties and shows how most PhD students have missed finding an important resource in their research. It calls for the need to reconsider training in information searching and the need to make use of controlled vocabularies implemented in various search systems with usable search and browse user interfaces.
Details
Keywords
Koraljka Golub, Jukka Tyrkkö, Joacim Hansson and Ida Ahlström
As the humanities develop in the realm of increasingly more pronounced digital scholarship, it is important to provide quality subject access to a vast range of heterogeneous…
Abstract
Purpose
As the humanities develop in the realm of increasingly more pronounced digital scholarship, it is important to provide quality subject access to a vast range of heterogeneous information objects in digital services. The study aims to paint a representative picture of the current state of affairs of the use of subject index terms in humanities journal articles with particular reference to the well-established subject access needs of humanities researchers, with the purpose of identifying which improvements are needed in this context.
Design/methodology/approach
The comparison of subject metadata on a sample of 649 peer-reviewed journal articles from across the humanities is conducted in a university repository, against Scopus, the former reflecting local and national policies and the latter being the most comprehensive international abstract and citation database of research output.
Findings
The study shows that established bibliographic objectives to ensure subject access for humanities journal articles are not supported in either the world's largest commercial abstract and citation database Scopus or the local repository of a public university in Sweden. The indexing policies in the two services do not seem to address the needs of humanities scholars for highly granular subject index terms with appropriate facets; no controlled vocabularies for any humanities discipline are used whatsoever.
Originality/value
In all, not much has changed since 1990s when indexing for the humanities was shown to lag behind the sciences. The community of researchers and information professionals, today working together on digital humanities projects, as well as interdisciplinary research teams, should demand that their subject access needs be fulfilled, especially in commercial services like Scopus and discovery services.
Details
Keywords
Koraljka Golub, Jenny Bergenmar and Siska Humelsjö
This article aims to help ensure high-quality subject access to Swedish lesbian, gay, bisexual, transgender, queer and intersexual (LGBTQI) fiction, and aims to identify…
Abstract
Purpose
This article aims to help ensure high-quality subject access to Swedish lesbian, gay, bisexual, transgender, queer and intersexual (LGBTQI) fiction, and aims to identify challenges that librarians consider important to address, on behalf of themselves and end users.
Design/methodology/approach
A web-based questionnaire comprising 35 closed and open questions, 22 of which were required, was sent via online channels in January 2022. By the survey closing date, 20 March 2022, 82 responses had been received. The study was intended to complement an earlier study targeting end users.
Findings
Both this study of librarians and the previous study of end users have painted a dismal image of online search services when it comes to searching for LGBTQI fiction. The need to consult different channels (e.g. social media, library catalogues and friends), the inability to search more specifically than for the broad LGBTQI category and suboptimal search interfaces were among the commonly reported issues. The results of these studies are used to inform the development of a dedicated Swedish LGBTQI fiction database with an online search interface.
Originality/value
The subject searching of fiction via online services is usually limited to genre with facets for time and place, while users are often seeking characteristics such as pacing, characterization, storyline, frame/setting, tone and language/style. LGBTQI fiction is even more challenging to search because indexing practices are not really being standardized or disseminated worldwide. This study helps address this important gap, in both research and practical applications.
Details
Keywords
Koraljka Golub, Joacim Hansson and Lars Selden
The purpose of the paper is to analyse three Scandinavian iSchools in Denmark, Norway and Sweden with regard to their intentions of becoming iSchools and curriculum content in…
Abstract
Purpose
The purpose of the paper is to analyse three Scandinavian iSchools in Denmark, Norway and Sweden with regard to their intentions of becoming iSchools and curriculum content in relation to these intentions. By doing so, a picture will be given of the international expansion of the iSchool concept in terms of organisational symbolism and practical educational content. In order to underline the approaches of the Scandinavian schools, comparisons are made to three American iSchools.
Design/methodology/approach
The study is framed through theory on organisational symbolism and the intentions of the iSchool movement as formulated in its vision statements. Empirically, the study consists of two parts: close readings of three documents outlining the considerations of three Scandinavian LIS schools before applying for the iSchool status, and statistical analysis of 427 syllabi from master level courses at three Scandinavian and three American iSchools.
Findings
All three Scandinavian schools, analysed, have recently become iSchools, and though some differences are visible, it is hard to distinguish anything in their syllabi as carriers of what can be described as an iSchool identity. In considering iSchool identity, it instead benefits on a symbolic level that are most prominent, such as branding, social visibility and the possible attraction of new student groups. The traditionally strong relation to national library sectors are emphasised as important to maintain, specifically in Norway and Sweden.
Research limitations/implications
The study is done on iSchools in Denmark, Norway and Sweden with empirical comparison to three American schools. These comparisons face the challenge of meeting the educational system and programme structure of each individual country. Despite this, findings prove possible to use as ground for conclusions, although empirical generalisations concerning, for instance, other countries must be made with caution.
Practical implications
This study highlights the practical challenges met in international expansion of the iSchool movement, both on a practical and symbolic level. Both the iSchool Caucus and individual schools considering becoming iSchools may use these findings as a point of reference in development and decision making.
Originality/value
This is an original piece of research from which the results may contribute to the international development of the iSchool movement, and extend the theoretical understanding of the iSchool movement as an educational and organisational construct.
Details
Keywords
Koraljka Golub, Jenny Bergenmar and Siska Humlesjö
The purpose of this study is to investigate the needs of potential end-users of a database dedicated to Swedish lesbian, gay, bisexual, transgender, queer, and intersex (LGBTQI…
Abstract
Purpose
The purpose of this study is to investigate the needs of potential end-users of a database dedicated to Swedish lesbian, gay, bisexual, transgender, queer, and intersex (LGBTQI) literature (e.g. prose, poetry, drama, graphic novels/comics, and illustrated books), in order to inform the development of a database, search interface functionalities, and an LGBTQI thesaurus for fiction.
Design/methodology/approach
A web questionnaire was distributed in autumn 2021 to potential end-users. The questions covered people's reasons for reading LGBTQI fiction, ways of finding LGBTQI fiction, experience of searching for LGBTQI fiction, usual search elements applied, latest search for LGBTQI fiction, desired subjects to search for, and ideal search functionalities.
Findings
The 101 completed questionnaires showed that most respondents found relevant literature through social media or friends and that most obtained copies of literature from a library. Regarding desirable search functionalities, most respondents would like to see suggestions for related terms to support broader search results (i.e. higher recall). Many also wanted search support that would enable retrieving more specific results based on narrower terms when too many results are retrieved (i.e. higher precision). Over half would also appreciate the option to browse by hierarchically arranged subjects.
Originality/value
This study is the first to show how readers of LGBTQI fiction in Sweden search for and obtain relevant literature. The authors have identified end-user needs that can inform the development of a new database and a thesaurus dedicated to LGBTQI fiction.
Details
Keywords
Koraljka Golub and Marianne Lykke
The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to…
Abstract
Purpose
The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme.
Design/methodology/approach
A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes.
Findings
The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness.
Research limitations/implications
Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation.
Practical implications
Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated.
Originality/value
A user‐based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.
Details
Keywords
Koraljka Golub, Marianne Lykke and Douglas Tudhope
The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social…
Abstract
Purpose
The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval.
Design/methodology/approach
Over 11,000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings.
Findings
The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology.
Originality/value
No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.
Details
Keywords
To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning…
Abstract
Purpose
To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.
Design/methodology/approach
A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.
Findings
Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.
Research limitations/implications
The paper does not attempt to provide an exhaustive bibliography of related resources.
Practical implications
As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.
Originality/value
To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Details