Images as data – modelling data interactions in social science and humanities research

Elina Late (Tampere University, Tampere, Finland)

Inés Matres (University of Helsinki, Helsinki, Finland)

Anna Sendra (Tampere University, Tampere, Finland)

Sanna Kumpulainen (Tampere University, Tampere, Finland)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 31 October 2024

Downloads

116

pdf (9.5 MB)

Abstract

Purpose

The expanded reuse of images as research data in the social sciences and humanities necessitates the understanding of scholars’ real-life interactions with the type of data. The aim of this study is to analyse activities constituting image data interactions in social science and humanities research and to provide a model describing the data interaction process.

Design/methodology/approach

The study is based on interviews with 21 scholars from various academic backgrounds utilising digital and print images collected from external sources as empirical research data. Qualitative content analyses were executed to analyse image data interactions throughout the research process in three task types: contemporary, historical and computational research.

Findings

The findings further develop the task-based information interaction model (Järvelin et al., 2015) originally created to explain the information interaction process. The enhanced model presents five main image data interaction activities: Data gathering, Forming dataset, Working with data, Synthesizing and reporting and Concluding, with various sub-activities. The findings show the variety of image data interactions in different task types.

Originality/value

The developed model contributes to understanding critical points in image data interactions and provides a model for future research analysing research data interactions. The model may also be used, for example, in designing better research services and infrastructures by identifying support needs throughout the research process.

Keywords

Citation

Late, E., Matres, I., Sendra, A. and Kumpulainen, S. (2024), "Images as data – modelling data interactions in social science and humanities research", Journal of Documentation, Vol. 80 No. 7, pp. 325-345. https://doi.org/10.1108/JD-08-2024-0195

Publisher

:

Emerald Publishing Limited

License

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

Introduction

Visual data, such as images and photographs sourced from social media platforms and archives, serve as crucial empirical evidence for social sciences and humanities (SSH) scholars exploring social behaviour (Ball and Smith, 2017; Chassanoff, 2018; Chen et al., 2021; Rose, 2022). However, most research on data use in SSH has concentrated on textual materials, resulting in a gap in understanding of interactions with image data. Images convey information differently from text: as Chassanoff (2018, 140) describes images are objects that “become information through the relationships and meanings we inscribe onto them”. Moreover, impediments, such as copyright issues, which often do not apply to textual data, may limit the utilization of images for research purposes (Rose, 2022).

The aim of this study is to analyse activities constituting image data interactions in SSH research and to provide a model describing the data interaction process. Images can be “found” or “created” for research purposes (Rose, 2014). Finding images refers to collecting already produced images from various sources and re-using them as research data. Creating images refers to images that are made specifically for research by the research team or by study participants (Rose, 2022). This study focuses on research tasks re-using found images from various sources.

We further develop a model originally created to grasp information interactions (Järvelin et al., 2015) to meet realities of interacting with image data. Information interaction is described as a “process that people use in interacting with the content of an information system” (Toms, 2002, 1). Interaction process is a collection of activities taking place at different stages of research, such as collecting, analysing and reporting data. Research on information interaction, or human–information interaction, focuses on people’s cognitive actions and behaviours with information, rather than with technology or librarians (Fidel, 2012). Therefore, it is not only focusing on specific activities such as information searching but entails the whole information interaction process and its various activities.

Information interactions do not occur as such but are rather triggered by some tasks, either related to leisure or work (Vakkari, 2001; Toms, 2011). Information needs and activities are derived from the underlying larger tasks. Therefore, our study analyses real-life research tasks utilising images collected from external sources as data. The study analyses in-depth interviews with SSH scholars and seeks to identify the activities and sub-activities constituting image data interactions in SSH research tasks.

The paper is structured as follows: First, we will discuss previous research focusing on research data interaction and the use of images as research data in SSH. Second, we will present the theoretical framework for task-based information interaction. Third, the research methods, including data collection and analyses, are explained. The results are presented in the following section. The article ends with discussion and conclusions.

Background

Research data can take different forms in different disciplines and a particular combination of interests, abilities and accessibility determine what is identified as data in each instance (Leonelli, 2019). Borgman (2015, 24) defines research data as “entities used as evidence of phenomena for the purposes of research or scholarship”. Research data can be big or small, open or closed, produced or re-used, born digital, digitised or analogue. Data are not only by-products of research but can serve as valuable research outputs and public objects (Wilkinson et al., 2016). Indeed, data in the social sciences and humanities (SSH) can remain relevant for analysis for a long time.

As SSH is a divergent group of disciplines with different epistemic practices, data interactions also take various forms. According to Borgman (2015), the heterogeneity of data types in SSH, ranging from quantitative datasets to qualitative interviews and multimedia data, necessitates tailored data management strategies. Data management practices in SSH are influenced by various factors, including ethical considerations, privacy concerns and disciplinary norms. The introduction of policies and mandates by funding agencies, research institutions and publishers has encouraged researchers to manage and share their data, although compliance remains uneven across disciplines (Gregory et al., 2020; Lilja, 2020). Over the last years, data archives and infrastructures for social sciences and humanities have been developed to better serve the needs of these fields (Sendra et al., 2023; Waters, 2022). However, data sharing is not a common practice in SSH research (Jeng and He, 2022; Zenk-Möltgen et al., 2018) and infrastructures do not support adequately different data types, such as images (Hansson and Dahlgren, 2022).

Although in many SSH fields, such as anthropology, human geography and art history, images have been used as data for a long time, the use of visual data has only expanded in recent years (Ball and Smith, 2017; Rose, 2022). The reason for this lies in the visual turn in society and the emergence of social media platforms and digital archives providing rich visual contents (Highfield and Leaver, 2016; Rejeb et al., 2022). Indeed, studies show the importance of image data as research data in SSH (e.g. Beaudoin, 2014; Chassanoff, 2018; Chen et al., 2021; Kamposiori, 2018). Capturing image data for research has been aided by the development of digital technologies that has made accessing image data easier. Although qualitative approaches are commonly used in SSH (Chen et al., 2021; Rose, 2014), new computer vision technologies allow analysing massive amounts of images (Berg and Nelimarkka, 2023).

Images can provide rich datasets having been described as more emotional, affective and ineffable (Bagnoli, 2009) and revealing hidden inner mechanisms (Knowles and Sweetman, 2004) compared to textual data such as interviews. Photographs can provide a valuable historical reference for verification, documentation or corroboration (Chassanoff, 2018). Indeed, images render the world in visual terms, but not by innocent means as they interpret the world by representing it in very particular ways (Rose, 2022). Images are always born and represented in a specific context that influences their reading (Jordanova, 2012). Therefore, it is usual in visual analysis to rely on text for interpreting the image and its context (Rose, 2014).

Recently, Rodrigues and Lopes (2023) and Fernandes et al. (2020) analysed image data management in the research life cycle model capturing activities: planning, creation/compilation, quality assurance, processing/analysis, description, storage and sharing. These studies witness the frequency of image data use in the SSH but also point out that most scholars do not formally manage their image data. According to them, image data use includes producing images (e.g. using cameras), anonymisation, quality control, documentation and storage but practices vary. Furthermore, Rodrigues and Lopes (2023) and Fernandes et al. (2020) show that image data is mostly stored on scholars’ own computers and data is rarely shared openly. Also, Late et al. (2024a) discovered the lack of image data sharing in SSH. Sharing data was impeded by the qualities of the data, ownership of data, data stewardship and research integrity issues. However, supporting the scientific community, the open science agenda and fulfilling research funders’ requirements were found as motivating factors on image data sharing that occasionally took place by informal means.

Theoretical framework

Various data life cycle models (e.g. Higgins, 2008; Rhee, 2024) have visualised research activities related to data. However, as data life cycle models serve data curation and represent the process from a data perspective, they may be idealistic and mask various aspects of the complexities of real research work (Carlson, 2014; Cox Tam, 2018). Rodrigues and Lopes (2023) argue that life cycle models do not emphasize data security and privacy issues relevant to image data that include information about places or persons, thus compromising confidentiality.

To fully cover the researchers’ perspective, we build our analysis upon the task-based information interaction (TBII) evaluation model (Järvelin et al., 2015), and further develop this model to cover activities in data interaction with image data. The TBII model incorporates the cognitive perspective of users into various activities, facilitating the analysis of the complete research process. Additionally, the TBII model helps evaluating how information interactions support achieving the task objectives. The theoretical framework encompasses five activities: task planning and monitoring, searching for information items, selecting information items, working with information items and synthesizing and reporting (see Figure 1).

Task planning and monitoring is an overarching activity that occurs at every phase of the interaction process. It involves the task-doer’s comprehension of the task and the necessary procedural knowledge. As the task is performed, this activity develops, leading to a more organised and clearer understanding of the task. Searching involves interactions with a search system to retrieve information items. Selecting focuses on deciding the usefulness of the discovered items. Working with items includes scanning and browsing, reading and annotating, as well as comparing and linking information. Finally, synthesizing and reporting are crucial for academics producing scholarly publications. Synthesizing involves integrating information from various sources to create new knowledge and generate new information items and objects.

We apply the model in our data collection and analysis, limiting planning and monitoring outside of our analysis. We develop the model further to cover the activities specific to image data interaction in SSH research tasks.

Research setting

Data collection

For this study 21 in-depth qualitative interviews were collected from scholars working in various SSH fields, who use digital and print images as their primary research data. The interviews took place both in Finland and Denmark, either in-person or online between February and December 2023. The interviews were conducted by the first and second authors either in Finnish or English. By using the interviewer triangulation, we were able to raise our research above the possible biases deriving from the personality of the researcher (Kumpulainen, 2017). Part of the interview data has been analysed for previous research (Late et al., 2024a).

The interviewees were reached out and selected by using personal contacts, through researcher networks where visual data was known to be used and by making web searchers of publications applying image data. Also, a snowballing technique was employed, where interviewees were asked to suggest potential participants having ongoing projects with image data. Table 1 provides an overview of the interviewees' profiles with various academic backgrounds. The participants encompass diverse fields within the SSH and span a wide spectrum of work titles and seniority levels, ranging from doctoral students to full professors.

Before the interviews, informed consents were collected, and information about the project and data collection was shared with the participants by email. Interviewees were asked to prepare themselves to discuss one recent research project in which they had used images as research data. The interview guide (Late and Kumpulainen, 2024) included background questions and questions related on three themes: the characteristics of image research data, information practices related to image research data and openness in the context of image data. The questions related to the second theme were based on the TBII model (Järvelin et al., 2015). By this, we ensured that the full interaction process was covered during the interview. Subsequently, a modified version of the critical incident technique (Flanagan, 1954) was employed, asking interviewees to describe their use of image data in a recent research task. However, the interviews didn’t strictly follow the order of the guide, and the interviewees were free to talk about their research. Instead, the guide served as a checklist, ensuring that the interviews covered all relevant topics.

The interviews were audio-recorded and transcribed into text for the analysis. Each interview spanned approximately 66 min, resulting in 22 h and 52 min of audio recordings. Interviewees were asked to display their research materials, publications and provide demonstrations of their data practices. This aided in gaining a more comprehensive understanding of the interviewees' research work practices.

Data analysis

Qualitative content analyses were executed using Atlas.ti software and Microsoft Excel. Initial coding was done by one scholar but later discussed with the research team to find consensus and to avoid biases in the analyses. The analysis started by reading the interview transcripts, followed by open and selective coding (Strauss and Corbin, 1997). During the first coding phase, research tasks (i.e. critical incidents) were identified and categorized into three task types (contemporary, historical and computational research) according to the use of data sources, applied research methods and purpose of image data use.

Next, all instances related to image data interaction activities were identified from the data. We first started theory-driven by focusing on four activities (searching, selecting, working with items and reporting and synthesizing) presented in the original TBII model. Data-driven coding took place when identified activities did not fit the original model. New labels were given to activities to better describe them. After this stage we identified five main level activities: Data gathering (n = 127), Forming dataset (n = 48), Working with data (n = 223), Synthesizing and reporting (n = 74) and Concluding (n = 33).

Coded items (N = 505) were extracted from Atlas.ti and transferred to Excel spreadsheets for further data-driven coding. At this stage we identified and coded the sub-activities for each activity and verified coding between the activities. Some instances were moved between the activities at this stage. Finally, the task types were coded for each participant to study the variation between different types of tasks. Quotations were selected from the interviews to illustrate the activities. If needed, quotations were translated from Finnish to English.

Findings

Research task types

Research tasks were categorised into three groups according to the image data sources, type of data, research method and purpose of image data use (Table 2).

Almost half of the tasks fell into contemporary research. In these tasks SSH scholars used mainly born digital data from various image data sources. The most common source was social media platforms. In these tasks images were qualitatively analysed and using small or medium size of datasets (1–1,000 images). In each task, one purpose of use was research data. Other purposes of use included, for example using images as part of data collection (e.g. in interviewing participants) and using images for illustration. In most cases (n = 7) images were analysed together with other datatypes such as interview data, social media data (e.g. posts or comments), media texts and ethnographic field diaries.

Five tasks were categorised as historical research. In these tasks historical images were collected from both print and digital archives, museums and social media platforms. Historians often use the term “primary sources” of their data, but for clarity we use the term dataset in each task type. This group used small or medium size of datasets (1–10,000 images) containing historical photographs, but it should be clarified that their datasets are a fraction of the images viewed in the physical archives. Cultural and historical methods were used to cover both aesthetic attributes, as well as the historical contexts of production or circulation of the images. For close reading all scholars used additional textual or other data such as literature, archival records, statements related to the images or examined the media where the images have been published (e.g. newspapers). Images were always used as research data and for illustration.

Six tasks form the category of computational research, where analysis focused on big image datasets (1,000,000 or more). Image data for these tasks were born digital and collected from social media platforms, corporate archives (e.g. satellite images) and national web archives. Focus was mainly on image data, but other data types were also included in the analyses, such as register data, ethnographic field diaries, social media data and archival audio data. Quantitative methods (e.g. computer vision, machine learning) were mostly applied but one project was using mixed-methods approach integrating qualitative methods with machine learning. Images were mostly used as research data but in several cases also as training material. In one task images were used only for data enrichment. Publications were illustrated with images from the data.

Activities in image data interaction

Our analysis resulted in five image data interaction activities with all having sub-activities (Figure 2). Although the activities are presented in Figure 2 as sequential components, the activities sometimes overlap, and they may appear in different order in the real-life research process. Scholars may also jump back and forth between the activities. It should be also noted that all research tasks do not necessarily involve all sub-activities and activities may vary between research task types.

Data gathering

During data gathering, image data is collected. Three sub-activities were identified: identifying data sources, identifying access points and collecting data (Figure 3).

Identifying data sources, such as social media platforms, archives, museums, websites and API, entailed searching information, earlier research, consulting colleagues or scholars’ previous knowledge. Scholars needed to find out where the image data was located, how they could access it and what the data costs. Various data sources were identified is historical research as in other task types, scholars relied more often on a limited number of sources.

I knew through life experience that this existed. Then I just had to find out where the material is […] and probably somehow just by googling I found the information that they have handed over part of the material to the National Archives. (P18, historical research)

Identifying access points to data entailed identifying social media accounts, participants, hashtags, keywords, time frames, data types and creators etc. These were required to access the data from the selected source. An important part of contemporary research was networking with the participants. Some scholars even created research accounts in social media platforms for collecting research data and to be open about their researcher position and provide information about their ongoing research.

I created a researcher account to Instagram just to be transparent […] you need to get in and think about how to connect with this network (P12, contemporary research)

In historical research identifying access points was rather demanding. Especially in the digital archives it was experienced “random” because of the missing metadata. The use of various data sources often required visits to the archives. For identifying access points, they needed to familiarize with the collections, institutions and their cataloguing systems. Particularly in computational tasks, where data was collected through an API or requested from organisations, this sub-activity was important. The scholars needed to specify their data request for their dataset. This required both understanding about the data structure and programming skills.

It's a web service where you can send a request in a very specific way. […] They have parameters and then you can loop over parameters. […] I think this is the hard access part that you actually need to be able to program to access the data […]. So you have to build your own dataset. (P9, computational research)

Scholars used various ways for collecting data. In contemporary research keyword searching and browsing were typically used. For example, scholars collecting social media images searched for specific accounts or hashtags and browsed the contents. Getting data handouts (according to requests) from colleagues and study participants was also common. Reverse snowballing was a more experimental form of collecting data by handouts. This happened when scholars asked their networks to share images about a specific topic forming a co-creative data collecting method.

In Facebook, I just posted something and asked people to share images with me and I've gotten pretty good ones […] I've described it as a reverse snowball method, where it's at same time rolling outward from me like a snowball […] it's a continuous process. (P2, contemporary research)

Ethnographic methods were also used in collecting data. Images were collected by shadowing the investigated phenomenon. In many cases data collection was continuous, and scholars followed the development of their topic in social media platforms constantly. They also utilised platforms’ recommendation systems to follow and collect data. Scholars reacted on political or other happenings in real-time data collecting. This type of behaviour was not usually pre-planned but scholars “seized the day” to capture any interesting phenomenon.

Well, this is exactly my ethnography, that part of it is just that I hang out there all the time and I follow these accounts. (P12, contemporary research)

Keyword searching and browsing were evident in historical research. Browsing archival contents (both print and digital) was a large part of the work and usually very time-consuming, requiring scholars to visit the physical collections located in different places, even abroad. Even when using only digital contents, collecting data was heavy as materials from various sources all had different organisation systems and access points. Additionally, data was also requested from colleagues or personnel in museums, archives or other organisations.

They had them in boxes and in some sort of a messy file with a kind of a label on them […]. But they were sort of in a mess. So you know I found all kinds of photos (P19, historical research)

Data scraping through APIs or other services was done only in computational research tasks. Sometimes scholars needed to make agreements with the data providers and pay for the data. In some cases, scraping resulted only in links to images that needed to be resolved to access the images. Additionally, data was acquired from organisations such as national web archives who provided the data according to the request.

Forming dataset

Forming dataset activity entails sub-activities related to selecting, saving and filtering (Figure 4).

These sub-activities were present mainly in contemporary or historical research. Whereas in computational tasks datasets were already formed in the data request or data scraping when the whole dataset was downloaded or received at once. Selecting images was related to searching by keywords or browsing, in which case the scholars assessed the relevance of each found image one by one. This entailed specifying their selection criteria. Descriptions of selecting activity were rather rare. However, some scholars brought it up by describing the laborious work.

There was a lot to go through, since those hashtags have an awful lot of visuals […] so it took a lot of time to sort through it. (P3, contemporary research)

After images were found and selected, they were saved by taking screenshots, downloading images, scanning print images, taking photographs and saving to a platform gallery. Modern smartphones were important devices for many scholars. When working with social media, taking screenshots was the most typical way to save data. Additionally, images were saved by taking photographs. This was an essential part of the work in historical research.

iPhone is my infrastructure these days because in iPhones and the latest mobile phones altogether, the cameras are so good. But I also used a scanner if I think that most likely this might be a good image for my article or for the coming book. (P19, historical research)

An interesting way to work around the data protection issues was to create a “gallery” of images in the social media platform and not to export any of the contents. When doing this, scholars needed to accept the fact that their data set may change during the project as individual images or accounts were removed from the platform.

I haven't saved them anywhere, so my collection can change, and sometimes people drop out, for example, they close their accounts, but my collection is only on my Instagram account. I can find that collection there, but I don't have it saved anywhere. […] But it's more a question of whether I save the personal register. (P5, contemporary research)

Filtering refers to selecting subset of images from an existing dataset. Filtering activity was very frequent, as scholars often collected large data collections loosely around topics but only used filtered parts of it for their given tasks. In contemporary or historical research filtering was done manually by browsing through the images. Sometimes images were discarded because of the low quality or lack of metadata. Filtering took also place in computational research when scholars needed to evaluate the data to be sure of the relevance of the images to their task. They also needed to filter metadata related to the images for analysis.

Because it's a hashtag hell. Welcome to hell. So there was of course also unrelated things. So we looked at … For example, if the #[…] was connected with another hashtag. (P11, computational research)

Working with data

Working with data activity is central for interpretation of images and it entails three sub-activities: data preparation, method and tool development and data analysis (Figure 5).

Data preparation included various activities to organise and transform the data into a suitable format for later analyses. This included organising data into folders, making backup copies of the data, transforming images into text or printing images for analysis and cleaning, validating and enriching data. Image data were organised into files according to date, source, type or topic of the image. Organising data was something that in some contemporary tasks was described as “chaotic” or “un-systematic”. This was causing problems later when trying re-find the same images. However, during the research process the organising practices developed to better serve scholars’ needs. In historical research scholars sometimes reused the same organisation system as in the archive where the data was discovered.

I have found classifying more and more important because the more data or more images you have the more important it is. It has to be quite specific and under a certain big label that you can find them. So, it's still a bit not coherent, but, I'm working towards that. (P19, historical research)

Data preparation included transforming images into text for the analyses or they were printed on paper. Transforming images into text merely meant writing out descriptions of the image, for example, in spreadsheets. Thus, it already required interpretation of the data.

I had 100 images in a … A lot of them are like kind of just variations of the same image, so I organise them into 38 sets, and then I wrote 38 vignettes which described each image and so … I tried to write what was in each image. (P1, contemporary research)

Data cleaning and anonymisation was an important part of the data preparation activity. In computational research, scholars needed to clean the data manually to be able to process it automatically. In other task types cleaning was not evident, but in contemporary research screenshots from social media platforms that included account or person names and profile pictures required anonymisation. Then, photo editing software could be used for removing any personal information.

As a part of the data preparation activity image data was enriched and validated by different means, and it took place in all three task types. This included investigating the publishing years and the provenience of the images or adding geographical information or information (e.g. names, gender, age) about the persons in the images to the dataset. Data enrichment happened mostly manually, but in quantitative research automatic means were also applied. This required understanding of the data and its limitations. In computational research data was also integrated with other data types using coordinates or other metadata. This enabled combining the analyses of different data types and datasets and look for connections.

Method and tool development refers to developing tools and methods for the analysis of image data. Tool development was relevant in computational research where scholars design and apply novel automatic ways to analyse images. These included training models and neural networks. Tool development included trial and error and developing models to fit in the visually varied contexts. Scholars also applied, tested and selected of-the-shelf computer vison tools for analysing their images.

We did a lot of trial and error. First, we tried with off the shelf solutions and tried Microsoft Azure and Google vision and all the whatever is out there. But that didn't work very well. […] So in the end, we found that we have to train the system ourselves. (P12, computational research)

Part of the tool development was the creation of training materials for machine learning. In some cases, creating the training materials was outsourced and in others done by the research team. Evaluating tools was another part of the tool development, where tools were tested and further developed to meet the research needs of the task. This required manual work to see if the tool worked as it should be.

In contemporary research, scholars experimented and developed research methods for image analysis. Many argued that there was a lack of qualitative visual research methods for SSH research as the history of image analysis rooted from art history research. Scholars also experimented with different text analysis software.

So especially with that set of images, we tried three different ways and that was nice to try what each way worked or not, but in terms of software it was in Vivo and Excel. (P6, contemporary research)

Data analysis happened through various means depending on the task type. In contemporary and historical image research, images were analysed manually by data and theory driven coding and categorizing, close reading and interpreting, comparing and quantifying. Image data analysis did not necessarily differ from textual data analysis. Images were often analysed in relation to other data that was usually textual and considering the context of the image was important. For historical research, the archive or the collection where the image was found was important in defining the context. However, in some cases scholars deliberately focused on the image only and discarded the text, such as social media comments or captions around the image. For the analysis, scholars used software that was originally designed for textual data, such as Atlas.ti, NVivo, Word and Excel. These tools were mostly perceived as functional, but some had difficulties in using them. For example, if analysis was coded into Excel spreadsheets, the original images were not visible there and they needed to jump back and forth between devices.

When I start to see how they are grouped I may be relying on my own memory to some extent. The Excel doesn’t include the images. I might then stare at my phone where I can see them from Instagram. It shows several, but never the whole material at once. (P5, contemporary research)

In computational research, the analyses usually took two phases. First, image data was automatically analysed using computer vision techniques to detect objects and machine learning to predict and identify patterns, relationships and categories. Secondly, derived data from the images was analysed with statistical methods and network analysis which also required interpretation. In most cases, the original images were not important for the scholars at the analysis stage where the focus was on derived data.

I let the machine to look at them and find the pattern for me, so I don't have to look for the pattern. […] Then it's my job to figure out what did it find out. It doesn't tell me that. It's very good at predicting stuff. It's just not very good at telling you why it predicted it that way. (P9, computational research)

Synthesizing and reporting

Synthesizing and reporting refers to activity leading to presenting and publishing the research results. Two sub-activities were identified: presenting findings and illustrating findings (Figure 6).

In presenting findings, results from the tasks were published as journal and conference articles, and informally presented in seminar presentations or lectures. In computational research, the methods were sometimes published as separate articles targeting journals in different fields depending on the subject. When selecting the publishing forums scholars needed to think about the readership and the publication practices of the journals. In computational research, one scholar found that the core journals of his field did not yet support quantitative research. In contemporary research, one scholar perceived that using images as data was something new in her field and required more justification compared to textual materials.

The journals that would normally be interested in that just don't get the method, they consider it wrong or weird or weak. It's almost like they only now start to accept the Internet as a thing. And then there is all these statistics and then now there's machine learning to consider also, and it's even harder. So, I have to publish in non-core sociological journals. (P9, computational research)

When presenting the results scholars write about the findings, the methods and the analysis. They produce descriptions of the data sources and contexts of the images. Images were used as evidence of the data or analysis in the same way as providing excerpts from interview data. Most scholars think that publishing images from the data is important but not always possible. If publishing an image was not possible, a typical way was to provide links to the images or describe the image in such detail people could find it by themselves.

Someone asked if you can write about pictures without publishing them? You probably could, but I don’t think that the analysis will work in many situations. Practices of scientific publishing have been created for analysing texts. Even if you describe it as exhaustively as you can, nine readers see nine different images (P17, historical research)

Illustrating findings occurred by different means. If the scholar had permission to publish the images, they selected representative examples to illustrate the findings. When selecting the illustrations, scholars also considered what kind of images can be published to ensure, for example, the privacy of the people in the images.

We didn't choose pictures of people who are in stigmatizing professions or arrest pictures. The purpose of those pictures is always to show people as part of some community. […] The idea was not to show them in an unfavourable light. (P14, contemporary research)

However, the images could not be always published in their original form and anonymisation was required if there were people in the images. This might mean manually blurring the images to mask the persons. In some cases, the data provider had already anonymised the data before releasing it. Making collages by selecting multiple images and collecting them into one was an interesting way to illustrate findings in contemporary research. Some scholars in contemporary and computational tasks created new versions of the images to provide illustrations. In these cases, scholars asked artists to draw illustrations based on the original images to help readers understand the phenomenon. One scholar even took own photographs to illustrate the topic and to get insights of the subject. In some cases, results derived from the data were presented as graphs to summarise multiple images. Illustrative images were also acquired from external sources, for example by buying stock images or using public materials.

Sometimes we have used a similar image from a public source, for example. But there are all kinds of options for that. (P13, computational research)

Concluding

The final activity Concluding refers to the stage in which scholars need to decide how to manage their data at the end of the task. Three sub-activities were identified: preserving, destroying and sharing (Figure 7).

Preserving the data for later use or destroying it are the usual outcomes for image data after a project. In some contemporary tasks, data preservation was pre-planned, but only part of the data was preserved for later use. Data was preserved on scholars or organisation’s own devices or servers, or in institutional data archives. In computational tasks, preserving and sharing of codes and derived data was more relevant and one scholar argued that the code formed the dataset.

These scraped materials, we don’t keep them, they are only used for training and analyses. But we don't keep them, we're not allowed to keep them. You need to accept the fact that here the actual data is produced by the code. (P13, computational research)

A need for infrastructure for image data preservation was evident since all data repositories do not archive image data or social media data. Especially, preserving large sets of image data called for external infrastructure because they could not be stored on individual computers. Sometimes scholars also wanted to update dataset versions, an activity that is not adequately supported by data archives. Therefore, some used GitHub for data preservation.

In historical research, the developed, digitised or enriched data would be preferably preserved in the original archive. However, in many cases, the archives were not ready to receive the data, so it was left on the scholars’ devices. Some archives also required specific data formats scholars could not produce – or producing them would require extra work for which they had no resources.

I asked the archive if they're very interested to get this kind of corrected information but they had no resources to integrate and correct this database so I just kept it for myself and let them know that if they need this, they can contact me. (P21, historical research)

However, in most cases scholars lacked a clear plan for their data after the project. The data was stored on scholars’ own computers or other devices (e.g. phones) without planning whether it should be re-used or preserved. Some scholars described their data organisation as “messy” or un-curated, thus preventing preservation in archives. For data preservation, thorough documentation and anonymisation was needed. In many cases, scholars did not have the resources, such as time or skills, for preserving their data accordingly.

If consents or agreements with data providers required destroying the data, it happened in contemporary and computational research. For those working with historical tasks this was less of a concern as the original research material was already archived. Historians tended to accumulate personal collections.

Sharing image data was rare in our interview data, and preserved data was not always shared. Many scholars perceived that data sharing was not an established practice in their field and for example journals did not expect opening data with the articles. Yet, some research funders require data sharing. Often, data archiving openly was impossible due to licenses, agreements and ethical issues; or scholars had not considered data sharing at all. However, in some cases image data was informally shared by email or storing images to open clouds, where anyone having access could download it. Data might be shared informally also if any needs from their research community were recognized.

Discussion

Examining image data interaction in SSH research tasks revealed five activities: Data gathering, Forming dataset, Working with data, Synthesizing and reporting and Concluding. Although the activities are presented in sequential order they may overlap and appear in a varying order in real-life research processes. Further, some activities, such as forming dataset, are hard to separate from other activities. Therefore, Koolen et al. (2020) proposed a palette metaphor instead of pipeline models to capture the digital humanities research process, that would better cover the idea of overlapping activities.

The activities were analysed in three different task types: contemporary, historical and computational research. Activities take different forms in the different task types. Data gathering constitutes of three sub-activities: identifying data sources, identifying access points and collecting data. All three sub-activities are required in its completion. As Korkeamäki et al. (2022) suggest, successful SSH scholarship necessitates information about all these, but also information about the context of the data.

In all three task types, identification of data sources and access points depended on the desired image data types. In historical research, multiple data sources were usually identified (including both digital and print images), but in contemporary and computational research scholars relied typically on one or two sources. In all tasks, identifying access points to data was a critical step in the data collection and required specific knowledge about the data sources. In historical tasks this was demanding. According to Chassanoff (2018) and Late et al. (2023a, 2024b) scholars desire various access points to historical images that often lack appropriate metadata. Earlier studies have called for complimentary documentation covering both material, mental and social aspects of the documents (Buckland, 2016; Skare, 2024). This would help historians access the materials.

The methods of collecting data were varied. While in contemporary and historical tasks data was often collected manually image by image, in computational tasks data was collected through APIs or similar services that required technical skills. Lately, some platforms (e.g. Instagram) have disabled their APIs, which is severely hindering image use in research (McCrow-Young, 2021). Furthermore, data handouts from external data providers were received in all task types, and in varying formats. Print images were collected only for historical research tasks, while others relied on digital formats. Indeed, digital collections can provide easy access to data, but they do not include everything and cannot always replace physical items, particularly in terms of quality (Chassanoff, 2018; Hoekstra and Koolen, 2019; Late and Kumpulainen, 2022; Sinn and Soares, 2014).

Forming dataset activity entails selecting, saving and filtering data. These sub-activities were mainly present in contemporary and historical tasks. Although our data included only hints of data selection, Hoekstra and Koolen (2019) emphasize the importance of this activity in data interaction and explain how data interpretation takes place already at when scholars are making decisions on what data to select and use to answer their research questions. They elaborate on a variety of factors in digital collections and tools that affect scholars’ selecting activity, including source criticism in designing such tools. Scholars saved the selected images in various ways, stressing the role of good quality cameras on smartphones as data collecting devices. In historical research print images were commonly digitised. The sub-activity refers to filtering relevant items from a larger set of collected images to form a final dataset. This was a common activity also in computational tasks.

Working with data entails three sub-activities: data preparation, method and tool development and data analysis. In this activity the formed dataset was prepared and analysed. Data preparation that Given and Willson (2018) characterise as a meta-level research practice, took place in all task types. It aimed to organise and manipulate (e.g. clean, anonymise, enrich) the dataset into a form that could be later worked with. In contemporary tasks images were sometimes transformed into text or printed on paper for easier analysis. This is like Given and Willson’s (2018) observation about data preparation work to make multimedia data accessible as text. Data preparation should not be overlooked as it is fundamental for data analysis and often laborious requiring content and technical expertise (Given and Willson, 2018; Hoekstra and Koolen, 2019). Indeed, support and infrastructure are needed throughout the entire research process – not just at the beginning and ending (Borgman, 2007; Weller and Monroe-Gulick, 2014). Method and tool development was evident in computational research that used images as training materials for machine learning. However, also in contemporary tasks experimenting and developing methods was common.

In image analysis, both textual and numerical data were analysed with similar methods. None of the scholars in contemporary or historical tasks used any automatic methods for analyses although options exist (see, e.g. Berg and Nelimarkka, 2023; Webb Williams et al., 2020). Indeed, studies have shown users having conflicting attitudes and needs for automatic methods (Beaudoin, 2016; Late et al., 2023b). Text analysis tools were often used for image analysis. In general, the tools were found suitable but, in contemporary research, scholars needed to jump back and forth between tools and datasets because they needed to see the original image when analysing. Also Trace and Karadkar (2017) have witnessed scholars’ struggles with various tools and Kumpulainen et al. (2020) describe historians work practices as “semidigital” due to lacking tools. Therefore, further tool development is needed to support image data analysis. In the computational tasks, the analysis usually begins by analysing the image raw data automatically and then re-analysing the product. In many cases off-the-shelf solutions did not serve the purposes (c.f., Berg and Nelimarkka, 2023) leading to training suitable models.

Synthesizing and reporting activity includes two sub-activities: presenting findings and illustrating findings. In this activity findings from the image data are presented and illustrated for journal and conference articles, presentations and lectures. In computational research methods were usually published separately from findings. Still, it was evident in their fields that tool development or quantitative research was not acknowledged (c.f., Kumpulainen and Late, 2022). However, as Given and Willson (2018) argue, tool development is not only an outcome of research but research itself requiring support from institutions.

When presenting the findings based on the image data, scholars provided textual descriptions and included selected samples from the data as evidence. Certainly, writing remains a central aspect of scholarly practice, even though data may be visual (Given and Willson, 2018). Illustrating findings with the image data was considered as important. However, in many cases publishing original images from the data was not possible or it was constrained. Therefore, scholars used workarounds by manipulating, creating new versions or graphs or acquiring illustrations from other sources. The demanding work to provide visual illustrations is an important part of scientific practice to acquire readership and foster debate (Amann and Knorr-Cetina, 1988; Latour, 1990).

Concluding activity refers scholars’ need to decide what happens to their data at the end of the task. Three sub-activities were identified: preserving, destroying and sharing. Our findings corroborate previous research that show that typically, image data is not formally managed, and many scholars do not have a plan for what to do with their data after the project (Fernandes et al., 2020; Rodrigues and Lopes, 2023). This led to storing the data on scholars’ own devices. However, the lack of infrastructure and support for image data management also caused this. For example, in historical image research tasks scholars wanted to deposit their enhanced datasets back to the original archive, but usually this was not possible. In some computational and contemporary tasks, it was clear for the scholars that the data needed to be destroyed after the project according to the consent from the participants or agreements with data providers. Image data was shared mostly by informal means, as already analysed in more detail by Late et al. (2024a).

There were some limitations in our study. We analysed 21 interviews with SSH scholars re-using images as research data. However, our data includes only critical incidents discussed during the interviews. For example, image data use in teaching was not covered although it is likely taking place (Kamposiori, 2018). The number of participants and the balance between the research task types limits the generalization of the findings. Ethnography on scholars’ work would yield a more comprehensive picture of the scholar’s interactions with image data. Although this paper could not cover planning, monitoring and management activities in image data uses, it was evident that these are important activities throughout the interaction process. For example, collecting consents and making agreements with data providers were covered in the interviews. These affected the use of the image data spanning from collecting to publishing. Scholars invented workarounds for making interactions possible in terms of user rights and research integrity. Also, ethical issues were very often brought up in the interviews. Future research should focus on these activities because, as our research hinted, they seem very critical to image data use.

Conclusions

Image data supports are typically considered from the viewpoint of data’s life cycle but the researchers’ viewpoint of utilizing images as research data is lacking. The research provided a rich description and analysis of the activities taken by SSH scholars during image data interaction and contributes to the research gap in understanding the use of images as research data. Our analysis built on the TBII model (Järvelin et al., 2015) and provided an enhanced model with five image data interaction activities (Data gathering, Forming dataset, Working with data, Synthesizing and reporting and Concluding) with various sub-activities. This model explains critical points in image data interactions and may be proven useful in future research about data interactions. The model may also be applied in designing better research services and infrastructure development work by identifying support needs throughout the research process.

Figures

Figure 1

Activities in task-based information interaction model

Figure 2

Activities and sub-activities in image data interaction process

Figure 3

Sub-activities in data-gathering in different task types

Figure 4

Sub-activities in forming dataset in different task types

Figure 5

Sub-activities in working with items in different task types

Figure 6

Sub-activities in synthesizing and reporting in different task types

Figure 7

Sub-activities in concluding in different task types

Table 1

Profile of the interviewees. number of participants in parentheses

Country	Finland (15), Denmark (6)
Work organisation	University A (7), University B (4), University C (4), University D (3) University E (2), University F (1)
Discipline	Cultural and media studies (8), history (5), sociology (3), linguistics (2), political studies (1), psychology (1), information studies (1)
Work title	Professor (5), associate professor (5), post-doctoral researcher (9), doctoral student (2)
Interview format	Face-to-face (12), online (9)

Source(s): Created by authors

Table 2

Characteristics of the research task types employing image data

	Contemporary research (n = 10)	Historical research (n = 5)	Computational research (n = 6)
Image data sources	Social media, Internet, published media, street art	Archives, museums, social media	Social media, corporate collections, web archives
Type of image data	Digital	Print and digital	Digital
Purpose of image use	Research data, part of other data collection, illustration	Research data, illustration	Research data, training material, data enrichment, illustration
Other datatypes used	Interview data, social media data, media texts, ethnographic field diaries	Literature, archival records, historical media texts	Register data, ethnographic field diaries, social media data, archival audio data
Research methods	Qualitative	Historical research methods	Quantitative, mixed-methods

Source(s): Created by authors

References

Amann, K. and Cetina, K.K. (1988), “The fixation of (visual) evidence”, Human Studies, Vol. 11 No 2/3, pp. 133-169, doi: 10.1007/bf00177302.

Bagnoli, A. (2009), “Beyond the standard interview: the use of graphic elicitation and arts-based methods”, Qualitative Research, Vol. 9 No. 5, pp. 547-570, doi: 10.1177/146879410934362.

Ball, M. and Smith, G. (2017), “Working with visual data: practices of visualization and representation”, International Review of Qualitative Research, Vol. 10 No. 2, pp. 119-127, doi: 10.1525/irqr.2017.10.2.119.

Beaudoin, J.E. (2014), “A framework of image use among archaeologists, architects, art historians and artists”, Journal of Documentation, Vol. 70 No. 1, pp. 119-147, doi: 10.1108/JD-12-2012-0157.

Beaudoin, J.E. (2016), “Content-based image retrieval methods and professional image users”, Journal of the Association for information science and technology, Vol. 67 No. 2, pp. 350-365, doi: 10.1002/asi.23387.

Berg, A. and Nelimarkka, M. (2023), “Do you see what I see? Measuring the semantic differences in image-recognition services’ outputs”, Journal of the Association for Information Science and Technology, Vol. 74 No. 11, pp. 1307-1324, doi: 10.1002/asi.24827.

Borgman, C.L. (2007), Scholarship in the Digital Age: Information, Infrastructure, and the Internet, MIT Press, Cambridge, MA.

Borgman, C.L. (2015), Big Data, Little Data, No Data: Scholarship in the Networked World, MIT press, Cambridge, MA.

Buckland, M. (2016), “The physical, mental and social dimensions of documents”, Proceedings from the Document Academy, Vol. 3 No. 1, doi: 10.35492/docam/3/1/4.

Carlson, J. (2014), “The use of life cycle models in developing and supporting data services”, in Ray, J.M. (Ed.), Research Data Management: Practical Strategies for Information Professionals, Purdue University Press, West Lafayette, pp. 63-86.

Chassanoff, A.M. (2018), “Historians' experiences using digitized archival photographs as evidence”, American Archivist, Vol. 81 No. 1, pp. 135-164, doi: 10.17723/0360-9081-81.1.135.

Chen, Y., Sherren, K., Smit, M. and Lee, K.Y. (2021), “Using social media images as data in social science research”, New Media and Society, Vol. 25 No. 4, pp. 849-871, doi: 10.1177/14614448211038761.

Cox, A. and Tam, W. (2018), “A critical analysis of lifecycle models of the research process and research data management”, Aslib Journal of Information Management, Vol. 70 No. 2, pp. 142-157, doi: 10.1108/AJIM-11-2017-0251.

Fernandes, M., Rodrigues, J. and Lopes, C. (2020), “Management of research data in image format: an exploratory study on current practices”, in Hall, M., Merčun, T., Risse, T. and Duchateau, F. (Eds), Digital Libraries for Open Knowledge: TPDL 2020. (Lecture Notes in Computer Science Vol 12246), Springer, doi: 10.1007/978-3-030-54956-5_16.

Fidel, R. (2012), Human Information Interaction: an Ecological Approach to Information Behavior, MIT Press, Cambridge, MA.

Flanagan, J.C. (1954), “The critical incident technique”, Psychological Bulletin, Vol. 51 No. 4, pp. 327-358, doi: 10.1037/h0061470.

Given, L.M. and Willson, R. (2018), “Information technology and the humanities scholar: documenting digital research practices”, Journal of the Association for Information Science and Technology, Vol. 69 No. 6, pp. 807-819, doi: 10.1002/asi.24008.

Gregory, K.M., Cousijn, H., Groth, P., Scharnhorst, A. and Wyatt, S. (2020), “Understanding data search as a socio-technical practice”, Journal of Information Science, Vol. 46 No. 4, pp. 459-475, doi: 10.1177/0165551519837182.

Hansson, K. and Dahlgren, A. (2022), “Open research data repositories: practices, norms, and metadata for sharing images”, Journal of the Association for Information Science and Technology, Vol. 73 No. 2, pp. 303-316, doi: 10.1002/asi.24571.

Higgins, S. (2008), “The DCC curation lifecycle model”, International Journal of Digital Curation, Vol. 3 No. 1, pp. 134-140, doi: 10.2218/ijdc.v3i1.48.

Highfield, T. and Leaver, T. (2016), “Instagrammatics and digital methods: studying visual social media, from selfies and GIFs to memes and emoji”, Communication Research and Practice, Vol. 2 No. 1, pp. 47-62, doi: 10.1080/22041451.2016.1155332.

Hoekstra, R. and Koolen, M. (2019), “Data scopes for digital history research”, Historical Methods: A Journal of Quantitative and Interdisciplinary History, Vol. 52 No. 2, pp. 79-94, doi: 10.1080/01615440.2018.1484676.

Järvelin, K., Vakkari, P., Arvola, P., Baskaya, F., Järvelin, A., Kekäläinen, J., Keskustalo, H., Kumpulainen, S., Saastamoinen, M., Savolainen, R. and Sormunen, E. (2015), “Task-based information interaction evaluation: the viewpoint of program theory”, ACM Transactions on Information Systems, Vol. 33 No. 1, pp. 1-30, doi: 10.1145/2699660.

Jeng, W. and He, D. (2022), “Surveying research data-sharing practices in US social sciences: a knowledge infrastructure-inspired conceptual framework”, Online Information Review, Vol. 46 No. 7, pp. 1275-1292, doi: 10.1108/OIR-03-2020-0079.

Jordanova, L.J. (2012), The Look of the Past: Visual and Material Evidence in Historical Practice, Cambridge University Press, Cambridge.

Kamposiori, C. (2018), Personal Research Collections: Examining Research Practices and User Needs in Art Historical Research, University College London, Doctoral thesis (Ph.D).

Knowles, C. and Sweetman, P. (2004), “Introduction”, in Knowles, C. and Sweetman, P. (Eds), Picturing the Social Landscape: Visual Methods and the Sociological Imagination, Routledge, London, pp. 1-17.

Koolen, M., Kumpulainen, S. and Melgar-Estrada, L. (2020), “A workflow analysis perspective to scholarly research tasks”, Proceedings of the 2020 Conference on Human Information Interaction and Retrieval, Vol. 12, pp. 183-192, doi: 10.1145/3343413.3377969.

Korkeamäki, L., Keskustalo, H. and Kumpulainen, S. (2022), “Task information types related to data gathering in media studies”, Journal of Documentation, Vol. 78 No. 7, pp. 528-545, doi: 10.1108/JD-04-2022-0082.

Kumpulainen, S. (2017), “Task-based information searching: research methods”, in Encyclopedia of Library and Information Sciences, CRC Press, Boca Raton, pp. 4526-4536.

Kumpulainen, S. and Late, E. (2022), “Struggling with digitized historical newspapers: contextual barriers to information interaction in history research archives”, Journal of the Association for Information Science and Technology, Vol. 73 No. 7, pp. 1012-1024, doi: 10.1002/asi.24608.

Kumpulainen, S., Keskustalo, H., Zhang, B. and Stefanidis, K. (2020), “Historical reasoning in authentic research tasks: mapping cognitive and document spaces”, Journal of the Association for Information Science and Technology, Vol. 71 No. 2, pp. 230-241, doi: 10.1002/asi.24216.

Late, E. and Kumpulainen, S. (2022), “Interacting with digitised historical newspapers: understanding the use of digital surrogates as primary sources”, Journal of Documentation, Vol. 78 No. 7, pp. 106-124, doi: 10.1108/JD-04-2021-0078.

Late, E. and Kumpulainen, S. (2024), “Interview guide for SSH scholars about their image data use”, Zenodo, doi: 10.5281/zenodo.10807674.

Late, E., Ruotsalainen, H. and Kumpulainen, S. (2023a), “In a perfect world: exploring the desires and realities for digitized historical image archives”, Proceedings of the Association for Information Science and Technology, Vol. 60 No. 1, pp. 244-254, doi: 10.1002/pra2.785.

Late, E., Ruotsalainen, H., Seker, M., Raitoharju, J., Männistö, A. and Kumpulainen, S. (2023b), “From textual to visual image searching: user experience of advanced image search tool”, International Conference on Theory and Practice of Digital Libraries, Springer Nature Switzerland, Cham, pp. 277-283.

Late, E., Skov, M. and Kumpulainen, S. (2024a), “To share or not to share? Image data sharing in the social sciences and humanities”, Information Research an International Electronic Journal, Vol. 29 No. 2, pp. 386-400, doi: 10.47989/ir292834.

Late, E., Ruotsalainen, H. and Kumpulainen, S. (2024b), “Image searching in an open photograph archive: search tactics and faced barriers in historical research”, International Journal on Digital Libraries, doi: 10.1007/s00799-023-00390-1.

Latour, B. (1990), “Drawing things together”, in Lynch, M. and Woolgar, S. (Eds), Representation in Scientific Practice, MIT Press, Cambridge, Massachusetts, pp. 19-68.

Leonelli, S. (2019), “Data governance is key to interpretation: reconceptualizing data in data science”, Harvard Data Science Review, Vol. 1 No. 1, pp. 10-162, doi: 10.1162/99608f92.17405bb6.

Lilja, E. (2020), “Threat of policy alienation: exploring the implementation of Open Science policy in research practice”, Science and Public Policy, Vol. 47 No. 6, pp. 803-817, doi: 10.1093/scipol/scaa044.

McCrow-Young, A. (2021), “Approaching Instagram data: reflections on accessing, archiving and anonymising visual social media”, Communication Research and Practice, Vol. 7 No. 1, pp. 21-34, doi: 10.1080/22041451.2020.1847820.

Rejeb, A., Rejeb, K., Abdollahi, A. and Treiblmaier, H. (2022), “The big picture on Instagram research: insights from a bibliometric analysis”, Telematics and Informatics, Vol. 73, 101876, doi: 10.1016/j.tele.2022.101876.

Rhee, H.L. (2024), “A new lifecycle model enabling optimal digital curation”, Journal of Librarianship and Information Science, Vol. 56 No. 1, pp. 241-266, doi: 10.1177/09610006221125956.

Rodrigues, J. and Lopes, C. (2023), “Research image management practices reported by scientific literature: an analysis by research domain”, Open Information Science, Vol. 7 No. 1, 20220147, doi: 10.1515/opis-2022-0147.

Rose, G. (2014), “On the relation between ‘visual research methods’ and contemporary visual culture”, The Sociological Review, Vol. 62 No. 1, pp. 24-46, doi: 10.1111/1467-954X.1210.

Rose, G. (2022), Visual Methodologies: an Introduction to Researching with Visual Materials, 5th ed., SAGE Publications, London.

Sendra, A., Late, E. and Kumpulainen, S. (2023), “More than data repositories: perceived information needs for the development of social sciences and humanities research infrastructures”, Information Research, Vol. 28 No. 4, pp. 83-101, doi: 10.47989/ir284598.

Sinn, D. and Soares, N. (2014), “Historians’ use of digital archival collections: the web, historical scholarship, and archival research”, Journal of the Association for Information Science and Technology, Vol. 65 No. 9, pp. 1794-1809, doi: 10.1002/asi.23091.

Skare, R. (2024), “The importance of a complementary approach when working with historical documents”, Journal of Documentation, Vol. 80 No. 3, pp. 618-631, doi: 10.1108/JD-03-2023-0060.

Strauss, A. and Corbin, J.M. (1997), Grounded Theory in Practice, Sage, Thousand Oaks.

Toms, E.G. (2002), “Information interaction: providing a framework for information architecture”, Journal of the American Society for Information Science and Technology, Vol. 53 No. 10, pp. 855-862, doi: 10.1002/asi.10094.

Toms, E.G. (2011), “Task-based information searching and retrieval”, in Ruthven and Kelly (Eds), Interactive Information Seeking, Behaviour and Retrieval, Facet Publishing, London, pp. 43-75.

Trace, C.B. and Karadkar, U.P. (2017), “Information management in the humanities: scholarly processes, tools, and the construction of personal collections”, Journal of the Association for Information Science and Technology, Vol. 68 No. 2, pp. 491-507, doi: 10.1002/asi.23678.

Vakkari, P. (2001), “A theory of the task-based information retrieval process: a summary and generalisation of a longitudinal study”, Journal of Documentation, Vol. 57 No. 1, pp. 44-60, doi: 10.1108/EUM0000000007075.

Waters, D.J. (2022), “The emerging digital infrastructure for research in the humanities”, International Journal on Digital Libraries, Vol. 24 No. 2, pp. 87-102, doi: 10.1007/s00799-022-00332-3.

Webb Williams, N., Casas, A. and Wilkerson, J.D. (2020), Images as Data for Social Science Research: an Introduction to Convolutional Neural Nets for Image Classification, 1st ed., Cambridge University Press, Cambridge.

Weller, T. and Monroe-Gulick, A. (2014), “Understanding methodological and disciplinary differences in the data practices of academic researchers”, Library Hi Tech, Vol. 32 No. 3, pp. 467-482, doi: 10.1108/LHT-02-2014-0021.

Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., da Silva Santos, L.B., Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J., Groth, P., Goble, C., Grethe, J.S., Heringa, J., ’t Hoen, P.A., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J. and Mons, B. (2016), “The FAIR Guiding Principles for scientific data management and stewardship”, Scientific Data, Vol. 3 No. 1, pp. 1-9, doi: 10.1038/sdata.2016.18.

Zenk-Möltgen, W., Akdeniz, E., Katsanidou, A., Naßhoven, V. and Balaban, E. (2018), “Factors influencing the data sharing behavior of researchers in sociology and political science”, Journal of Documentation, Vol. 74 No. 5, pp. 1053-1073, doi: 10.1108/JD-09-2017-0126.

Acknowledgements

Funding: This work was supported by the Research Council of Finland, grant numbers 351247 and 345618.

Corresponding author

Elina Late can be contacted at: elina.late@tuni.fi

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Citation

Publisher

License

Introduction

Background

Theoretical framework

Research setting

Data collection

Data analysis

Findings

Research task types

Activities in image data interaction

Data gathering

Forming dataset

Working with data

Synthesizing and reporting

Concluding

Discussion

Conclusions

Figures

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

References

Acknowledgements

Corresponding author

Related articles

All feedback is valuable

Report an issue or find answers to frequently asked questions