The impact of artificial intelligence on scholars: an interview with Juan D. Machin-Mastromatteo

Juan D. Machin-Mastromatteo, Anna Maria Tammaro

Digital Library Perspectives

ISSN: 2059-5816

Article publication date: 29 October 2024

Issue publication date: 29 October 2024

282

Citation

Machin-Mastromatteo, J.D. and Tammaro, A.M. (2024), "The impact of artificial intelligence on scholars: an interview with Juan D. Machin-Mastromatteo", Digital Library Perspectives, Vol. 40 No. 4, pp. 700-708. https://doi.org/10.1108/DLP-10-2024-151

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited


Artificial Intelligence (AI) is changing the way knowledge is created, discovered and disseminated with a possible impact on the role of digital libraries, including the professional work of the digital librarian, which remains controversial and difficult to predict. In this context, what seems to be possible is to anticipate the impact on areas of scholars’ work, to offer adequate digital library services to the new needs. In particular, Generative AI can speed up the writing and research processes, improve cooperation, teaching and learning and provide fresh perspectives for the academic world. However, the introduction of Generative AI into academia can also present potential downsides, and this interview aims to explore the positive and negative impacts of AI on scholars as a food for thought for digital librarians.

We chose to interview Juan D. Machin-Mastromatteo who with his young age represents a new generation of scholars in Library and Information Science (LIS) who are experimenting with AI.

Juan D. Machin-Mastromatteo is a full-time professor at the Autonomous University of Chihuahua (UACH, Mexico) and member of the National System of Researchers (Level II). He is a Doctor in Information and Communication Science (Tallinn University), Master in Digital Library Learning (Oslo University College; Tallinn University; and Parma University) and Bachelor in Librarianship (Central University of Venezuela). He is a specialist in information literacy, action research, bibliometrics, open access and digital libraries.

He has published over 150 scientific publications; has facilitated more than 40 courses; has participated in more than 120 international events as a speaker, panelist, organizer or moderator; and has facilitated over 20 workshops for training researchers. He is the Associate Editor of Information Development (SAGE), the Information Studies Journal (UACH) and until 2023 he held this position for Digital Library Perspectives (Emerald). He is also a member of the editorial advisory boards committees of The Journal of Academic Librarianship (Elsevier) and IE Revista de Investigacion Educativa (REDIECH). He has conducted peer reviews for 17 scientific journals within the fields of information science and education, for which he has evaluated over 300 manuscripts; and has also edited four special issues for various scientific journals. In Information Development he published, from 2015 to 2020, the column Developing Latin America. In 2019, he created the Juantífico Project: videos on scientific information, research and publication. From 2022, he co-hosts InfoTecarios podcast. Since 2023 he publishes the School of Editors section in the Information Studies Journal.

He has excelled in different roles: as Editor, Reviewer, Author, Researcher, Educators and in particular well known for a series of tutorials (videos) to explain scholarly communication to interested researchers.

Q1. Introducing your many research roles, I wonder then what impact generative AI technologies like ChatGPT have on all these different roles?

Nowadays, it seems that a researcher might assume different roles. It might not be enough anymore just to conduct research, write and publish. For different reasons, we must conduct various activities related to research and teaching to improve our visibility and growth opportunities, and in each of these roles, different rules apply regarding the use of generative artificial intelligence (AI).

I teach at the university. I am also a researcher, and thus I am also the author of scientific publications. I also conduct peer reviews for scientific journals, and I am involved as an associate editor in two journals. Finally, I also produce educational videos related to my research and teaching topics.

The use of AI in each of these roles may vary, specifically regarding how acceptable it is to use AI and under which conditions. In teaching, it might be acceptable to produce images through AI to accompany my presentations for lectures, also for finding reading materials more quickly and writing creative case scenarios for the students to solve problems. Something similar happens when I prepare a conference presentation – nowadays, a large part of the graphic content of my presentations corresponds to AI-generated images.

As an author and researcher, you must be more careful about how you use AI because there are certain guidelines that you should follow. Most major publishers (including Emerald) have already set policies regarding the use of AI in research and publishing, and if you analyze those policies, it turns out that there are very few acceptable uses of AI. The only thing you can use it for is to improve your writing. These policies also point out that AI cannot be used in any case to generate content, especially for data analysis or to draw conclusions.

Why is it acceptable to improve writing? The reasoning behind this seems to be that the content being passed through AI for this purpose has already been human-generated, so there’s no problem with that; the AI is just acting as a more sophisticated spelling and grammar review function than the one available in word processors. One reason why it would not be acceptable to generate content and put it in an article is related to the fact that most people use AI tools through the free versions, which give you very few options to customize the AI’s behavior. An alternative to this would be, for instance, to use OpenAI’s ChatGPT system by paying the premium subscription, which grants you access to customize ChatGPT’s behavior by configuring an assistant. For example, you can disable its ability to connect to the Internet to find responses, and you can also set very strict rules for its operation and prevent it from using sources and behaving in ways that you do not want it to.

There could be other acceptable uses, such as helping in the information-seeking process. There are very useful tools such as Consensus, Scite, Connected Papers and ResearchRabbit, which help identify literature that you may have missed or literature that you must check if you’re starting a new project, and some of these offer interesting features to navigate the literature and explore the relationships among authors and publications. There are also tools that provide summaries and help in reading or even tagging the literature that you are using to have the AI help you identify topics within full texts, such as Scispace. AI tools may help review the literature in a faster way; however, you must not use the content that the AI generates for you as is, without confirming its validity and without substantially transforming it.

My main concern with AI-generated content is that I simply don’t trust how AI works, e.g. it seems to be somewhat lazy and not comprehensive. Also, its responses tend to be very enumerative; its analysis might not be as deep as I would like it to be. In any case, the tool cannot guess what the human wants or how the human will make a proper analysis. These considerations, I feel, are especially valid within the humanities and social sciences, where analyses and meanings are so complex.

For journal editors and reviewers, other sets of rules apply. Under both roles, you must be aware that you are handling original manuscripts, which are, of course, not published, and hence, these are confidential works that must be treated accordingly. If you check them with an AI, you do not know how AI tools might be ingesting these manuscripts that you want to process through it. While the AI might not publish the manuscripts or upload them to public servers, the contents of manuscripts checked through AI could be used to further train the language model and become part of its knowledge base. This raises serious privacy and confidentiality concerns, as the authors might not authorize the use of their intellectual property in this manner.

There is the recent example of the publisher Taylor and Francis, which sold access to publications under their banner to Microsoft AI, and authors are not too happy about it. The current issue being discussed is: do we, as authors, want this to happen? The publisher has made a financial deal with an AI company, and as authors, we are not profiting from that, nor do we know the extent to which our intellectual property is being used. I doubt the publisher asked authors for their permission to use their publications in this manner, and older publishing agreements do not include provisions about this usage of authors’ intellectual property (do current agreements include this?). Of course, there could be alternatives in the near future where we can install local AI solutions (meaning, they are not online, and there are some currently available options) that might be tailored in a way that allows us to screen new manuscripts and perform some automatic evaluations on them while at the same time preventing the use of these contents in any other way.

Regarding the production of videos as part of a scientific dissemination process and to extend my reach as an educator in the topics of scientific information, research and publication, I am a bit more free to experiment and use AI. Many people will have seen some videos that are fully generated with AI by now. I feel that these kinds of videos are, to say the least, dry. They might be unnecessarily long, and they often develop arguments in circles, not even getting to the main topic. The human touch is very important when communicating anything, and this fact is relevant to any of the roles we have discussed.

The main question here might be: if I use AI to produce content, why should I pretend to have a human audience for it? If I was not able to produce content with my human intelligence, why should other humans bother to follow me?

In any case, for the purposes of producing videos, I have found some AI support, particularly to help create video covers or thumbnails with image generation tools, e.g. with DALL-E, which is included in ChatGPT’s premium subscription. However, human input is crucial, so the resulting images originate from sophisticated prompts, and I edit them further. For instance, I can add my own picture or other elements to compose a better thumbnail, such as adding some text to make it more informative and draw users to click on it.

I have also experimented with using AI to act as a marketing officer for my content. Making videos, promoting them, managing the channels and other social media sites is a lot of work – and it is something that I do entirely on my own, with the few resources I can afford and with my own skills. So, I have prepared a ChatGPT assistant that can read through my video scripts and generate a number (whichever I want it to be) of promotional phrases that are short, attractive, and that include hashtags and emojis to help drive engagement with my content.

Finally, I have developed another and much more sophisticated assistant with a very large prompt to analyze reading materials and produce draft scripts with the characteristics of how I write and the general narrative format of my videos, including its particular type of humor and references to popular culture. However, I know it is vital that I review these scripts, improve them, omit unnecessary parts and incorporate other elements that the AI might overlook. By the way, I will be posting some of these videos that started as AI-generated scripts but were corrected and enhanced by me. So far, these will be all AI-related videos (the irony!), and you can check them out on my YouTube channel[]. These videos are in Spanish, but English subtitles can be enabled. Of course, such usage of AI helps save a lot of time, even if I have to review and improve every script, which is important considering that with so many roles to fulfill, producing videos is an activity I do in my spare time – which, by the way, is very limited.

Q2. Are authors required to disclose the use of AI tools when drafting or revising a paper? At which stages of writing a paper is the use of AI considered acceptable (e.g. language editing, data analysis)?

As I previously stated, editorial policies regarding the use of AI for scientific publications are very strict and limit the usage of AI to a minimum extent. There are good reasons for this because, since the massification of these tools, it became apparent that many people may think, Oh, I can develop an article with a few clicks. But that is not acceptable. If you use AI to enhance your research activities, you must do it very carefully and ethically, of course, but you must also know what you are doing. You must understand research methods, at the very least, and the characteristics of the sources you are using to provide the background for your research and how scientific discourse is managed, developed and communicated.

You also need to know how to analyze data and understand the nature of the different statements that are included in a research publication. For instance, you have scientific facts that have been proven, as well as theories, hypotheses, and so on, each of these have their own limitations. There is also the issue that we do not know how exhaustive an AI-conducted analysis really is. I would say it is not exhaustive at all. However, as I previously highlighted, acceptable uses include retrieving literature through specialized AI technologies. The important thing here is that you should know that you must use specialized AI tools for these purposes because these are the ones that will respond to you with publications that actually exist. This is the main issue with using alternatives like ChatGPT and similar multi-purpose chatbots, as it has been well-documented that they tend to invent sources that do not exist and have a hard time providing proper citations.

There are many associated issues with this. If, for example, someone produces a preprint with made-up citations and uploads the document to a preprint server, Google Scholar and other systems could even index these fake citations, and this would result in bibliometric garbage!

You can use AI for proofreading and improving writing and grammar, which seems to be the almost exclusively universally acceptable use of AI for scientific publishing. AI-related policies also require you to disclose the usage of AI. Usually, you must report which model you used, its version, the date and time when you asked the system to provide the content generated, and provide a list of the sources that the system used to generate the content (this is impossible with the default and free options of the available chatbots). You will also need to explain how and why you used generated content.

I think that the most plausible case where AI-generated content may be acceptable is in a document where the AI tool was the object of study. Otherwise, I find it very difficult to justify its inclusion, mainly because of the problem that we do not know how exhaustive AI tools really are, how similar their analyses are to those of a human scientist, and, if we do not set strict limits on its behavior, it could be using any source on the Internet to generate content. This turns out to be very unscientific because our content must be authoritative, and we need to trace ideas back to their originators to check their validity, replicate them if applicable, and, simply put, because we need to trust that these ideas were produced by someone qualified to do so and under certain circumstances and appropriate methods.

Q3. In your opinion, how has the use of AI changed the process of research and academic writing?

I don’t know if we can actually say that things have changed – it might be too early to tell. However, as I mentioned previously, there might be people producing articles that include AI-generated content to various extents. Again, the concerns will focus on people who do not have the necessary knowledge to present a complete and valid piece of research and who want in the scientific ecosystem due to financial reasons while thinking this is a piece of cake; I call these people imposters. Some may even be using these tools to cheat their way into teaching or research careers. You may have noticed that some AI tools are already being promoted that promise to generate or modify content so that it will avoid being detected as AI-generated. I think they are a serious risk for what they promise because this trivializes our research work and the training and sacrifices we had to undertake to develop our scientific careers.

This also raises the issue that journals might start becoming overwhelmed with AI-generated articles, which would be very bad for editors, reviewers, and the entire publication ecosystem, which has a crisis in its capacity to review and publish the many articles that are produced nowadays, without counting those generated by AI. We would be flooded with articles that may not be worth evaluating or even paying attention to. Additionally, detecting AI-generated content with absolute precision seems to be a very difficult (if not impossible) task, even for other AI systems. It appears that the most precise way to detect potentially AI-generated content is through human intelligence, but only if you are already familiar with how the AI writes, which might change very soon, thus becoming completely indistinguishable from human writing.

Some characteristics of AI-generated writing include being overly enumerative, using too many bullet points with very little text, lacking analytical depth, and using excessive adjectives, which is uncommon in scientific communication.

In any case, sooner or later, these guidelines will continue to evolve and perhaps set alternative AI uses and the circumstances under which they may be acceptable for the scientific community. However, we also need these tools to provide more advanced features and functions so that we can better use them for scientific purposes. As I mentioned earlier, configuring assistants is a good step forward, but not everyone will be capable of configuring them appropriately, nor will they be bothered to spend the time needed to do so, as there is much testing and trial and error involved.

We can say that AI is democratizing programming because when using these tools (especially if you do it in sophisticated ways), you are essentially programming them. However, programming is no longer restricted to those who can handle computer languages like C or Python. The programming language for AI is natural language (no code programming), and this is perhaps one of the main innovations these technologies have introduced. However, not everyone will be able to do this properly, as there are many factors to consider when programming AI to work optimally and suited to your specific needs.

Once the guidelines for working with AI allow for other uses beyond correcting grammar and spelling, it could enhance our capabilities for processing very large sets of information and data. This could indeed have a significant impact, but guidelines must always be in place, and they should clearly define where the human contribution is essential, why it is important to have it, and what constitutes acceptable AI contributions.

Q4. Can the use of AI tools undermine academic integrity? If so, in what ways? How do you suggest distinguishing between legitimate assistance from AI and a form of plagiarism or academic misconduct?

So far, we can see that AI tools have the potential to undermine academic integrity. Most of the current discussions about the use of AI at any stage of scientific research and writing revolve around ethical considerations and concerns about research integrity. As I said previously, it is acceptable to use AI tools to help you locate the sources you need, get the literature you need to review, and help you select it. Although this might not result in an exhaustive list of pertinent results, that you will still get by conducting Boolean searches in scientific information systems. In any case, you will still have to read the material yourself, although there are AI alternatives to identify pertinent fragments in the literature faster, such as Scispace. Another acceptable use of AI would be to proofread your writing.

Regarding plagiarism, the issue arises when using the default functions of these systems because we don’t know where they are getting the ideas or content they are generating. If we do not know which sources are being used, the risk of plagiarism is very high. Since these systems mostly rephrase older ideas, it might be difficult, if not impossible, to determine the original source and add proper citations. This is why the use of AI relates to plagiarism.

A significant restriction in editorial policies related to AI usage is generating analyses and conclusions. The problem here is that the AI might have a certain interpretation that differs from that of a human researcher. A human considers methodological issues, as well as other research, theories, narratives and working concepts, which help guide their analysis in a very selective, scientifically accepted manner – something that AI might not be able to follow accurately or exhaustively.

Q5. How does the concept of “authorship” change if AI significantly contributes to the creation of the content of a paper?

Regarding the concept of authorship, there have not been any changes to it and perhaps should not be. When the first preprints and articles that used AI tools were submitted to journals, some authors included tools like ChatGPT as co-authors. However, we must refer to scientifically accepted authorship guidelines, such as the Vancouver protocol, which dates to the 1970s. Two of the most important principles that the Vancouver protocol states authors must fulfill are that authors must accept responsibility for the content they are presenting. This responsibility includes being accountable in various ways – scientifically, morally, ethically, legally and financially – for the content they have published. The other principle states that authors must provide their consent to submit their articles to a journal and to publish the article. Consent is something only human beings can provide.

Obviously, these guidelines were established long before the existence of these generative AI tools, but until we decide that a machine can provide consent, this remains a role reserved for humans. These are the main reasons why ChatGPT and other AI tools cannot be included as authors. They cannot provide consent, nor can they assume the responsibilities that an author must bear when presenting a piece of work.

Q6. What training measures do you suggest to educate faculty and students on the ethical use of AI tools in publication?

Any kind of training on the usage of AI tools for academic and research purposes must incorporate the ethical dimension. Many people are offering training on AI, perhaps just for the sake of clicks and views, as well as for increased online visibility or notoriety, but what is worrisome is that some are taking these matters too lightly and are not stopping on ethical considerations. This creates the impression that you can use these technologies without any forethought, which will negatively impact the value of people’s work and may foster the emergence of imposters.

Therefore, ethical considerations should be at the forefront of any training involving AI, particularly for intellectual tasks and especially when contemplating using these tools for education and research. People must understand how these systems work because our ethical concerns arise from their modes of operation. They must also consider established guidelines, such as the Vancouver protocol I just mentioned, the available editorial policies, and why it is difficult to accept a text written by AI as a valid piece of research work.

Q7. How could the use of AI be monitored or verified during peer review or the evaluation of submitted papers? What is your experience on it?

This is a very difficult question, as any journal editor should be able to access a tool to detect the usage of AI in manuscripts and be trained on how to use them, which is a recommendation made by the World Association of Medical Editors. However, I haven’t heard, nor do I have the experience, of publishers offering access to such tools. In any case, it would be very similar to how we handle plagiarism. When we suspect plagiarism, we usually contact our publisher’s staff to ask them to run a plagiarism check to assess the severity of the case, and then we make decisions based on the software report. I believe that access to such tools might be more common soon.

If we suspect inappropriate use of AI in an article, we could request our staff to use these tools to verify if AI has been used. However, there are some issues. Plagiarism software works by comparing a given document with large databases and online documents. When we want to evaluate if a text contains plagiarism, the software seeks for matches between the contents of the document we are evaluating with those available elsewhere to detect any coincidences. If there are, we then assess whether proper citations are used or not and to what extent these coincidences are just common phrases or if the author has copied text indiscriminately from sources without adding citations (only the latter will constitute plagiarism). Plagiarism software also provides a percentage based on the amount of text that coincides with other sources, and we should know that it is a mistake to only look at the percentage without carefully assessing (human evaluation) the nature of each coincidence to determine if there is plagiarism or not.

AI detection works a bit differently. The percentage offered by detection tools is based on the probability that the text was written by an AI instead of a human. The bad news is that detecting the use of AI to generate content is very difficult, imprecise, if not impossible. Human intelligence might be a bit more reliable in detecting AI-generated content, provided that you have already seen many AI-generated texts and have a good idea about how it writes. You can identify certain characteristics that, in the long run, as AI tools improve, may disappear, but for now, are noticeable.

Currently, AI-generated text tends to be very enumerative, often written employing bullet points with very light explanations. Its analysis seems shallow, it is very redundant, it uses many adjectives in sentences, and struggles to properly cite sources. All these aspects become evident when evaluating texts for their scientific value, for instance, in scientific work, we tend to avoid using adjectives as much as possible.

However, may we return to an issue I have already mentioned: manuscripts are confidential documents. I am uncomfortable sharing these confidential documents with an AI, as I do not know what it might do with them beyond what I instruct the tool to do. Will it use the document for training purposes? Will it store the document in a database? Will it be added to its knowledge base and use it for future responses? These are very serious concerns that should prevent us from using AI in editorial processes prior to publication, such as for the initial screening of manuscripts, desk rejections, peer review, and even production.

Q8. Are there tools currently in place or under development that you know to detect whether a paper has been authored using AI? How effective are they, and how should they be implemented in academic publishing in your opinion?

The main anti-plagiarism software vendors have also introduced solutions to detect AI-generated text, and they often publish performance comparisons among the available tools. However, the issue remains that it might be very difficult to accurately assess whether AI was used. In a way, this could become an arms race – AI detection tools will become more sophisticated, but the writing tools will also evolve and become more advanced than the detection tools, and so on. What is most alarming is that, on social media, you can already see advertisements for solutions that claim to generate undetectable AI text. I think this presents a significant challenge, which we should be very concerned about.

Q9. How do you foresee AI impacting the academic publishing landscape in the coming years, in terms of both quality and integrity?

If we do not universally address the ethical considerations, we will face many problems in the coming years. I am talking about journals and peer reviewers being overwhelmed with AI-generated content that might be just garbage. If you are using a machine to write for you, what right do you have to claim that your content is made for humans? We might see a lot of imposters trying to enter the research ecosystem.

The peer review system is already under a lot of stress due to high demand (i.e. too many authors submit too many articles), and there are not enough capable reviewers. If many people start adopting AI without paying much attention to ethical considerations and submit manuscripts that rely too heavily on AI to journals, we will face serious problems. This is why editorial policies dealing with AI have already emerged and are so strict – they attempt to prevent these situations. How successful are these types of policies for such a purpose? That is an important question.

If AI is not used appropriately, scientific quality and integrity will surely suffer. We have already discussed integrity at length, but regarding quality, you can already see how it might be affected. AI is not exhaustive enough, and its parameters are too loose. I have mentioned that configuring AI assistants might be a way forward, but many people will not take the time to learn how to do it properly or will not bother with the configuration possibilities of these technologies. Many will not know how to do it, and if they do not even understand research methods, that is why I believe we will see many imposters entering the system.

I can imagine that, at some point in the future, when this bubble of enthusiasm finally bursts, human-generated content will gain a renewed appreciation – after everyone becomes tired of AI content or when its overabundance creates a strong negative reaction to it. But I do not know when that will happen – perhaps in 5, 50 or 500 years? I really do not know. I am concerned that we will continue to base our scientific endeavors on trust, we must; although this might be a bit naive, we should trust that many of our peers will still do the right thing. Even if they use AI and do not report it, I hope they will use it in a way that does not compromise quality or integrity.

Q10. Which of your roles in scientific communication has had the greatest impact from generative AI?

I think that my role in scientific dissemination (i.e. making educational videos) has benefited more from AI because it has less strict rules and because I am using it for activities that would otherwise require me to hire a staff of specialists, which I cannot do as this is a zero-budget project. I use AI to generate images that are good for video thumbnails and also useful for conference or lecture slides. As I already mentioned, it has also been helpful in generating short phrases to promote my content (both my videos and new publications) online through social media.

I have been doing a lot of testing with AI to help write scripts. The first tries were awful, but after developing a very detailed prompt, the results improved a great deal, also because of the large differences among GPT-3 and GPT-4o. Still, everything must be human-checked, curated and improved. In any case, AI tools cannot guess exactly what you want. It will give you something that approximates a proper answer to the instructions you provide. Hence, the quality of its output depends on how well those instructions are developed, which should be very clear, explicit and systematic, just like the kind of instructions you need to teach a machine to do something. You can, of course, improve how you formulate these instructions over time.

Note

Related articles