Priyadarshini R., Latha Tamilselvan and Rajendran N.
The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the…
Abstract
Purpose
The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the recommendation of the source documents is facilitated by means of a framework in which the fourfold semantic similarity is implied. The latest trends in technology emerge with the continuous growth of resources on the collaborative web. This interactive and collaborative web pretense big challenges in recent technologies like cloud and big data.
Design/methodology/approach
The enormous growth of resources should be accessed in a more efficient manner, and this requires clustering and classification techniques. The resources on the web are described in a more meaningful manner.
Findings
It can be descripted in the form of metadata that is constituted by resource description framework (RDF). Fourfold similarity is proposed compared to three-fold similarity proposed in the existing literature. The fourfold similarity includes the semantic annotation based on the named entity recognition in the user interface, domain-based concept matching and improvised score-based classification of domain-based concept matching based on ontology, sequence-based word sensing algorithm and RDF-based updating of triples. The aggregation of all these similarity measures including the components such as semantic user interface, semantic clustering, and sequence-based classification and semantic recommendation system with RDF updating in change detection.
Research limitations/implications
The existing work suggests that linking resources semantically increases the retrieving and searching ability. Previous literature shows that keywords can be used to retrieve linked information from the article to determine the similarity between the documents using semantic analysis.
Practical implications
These traditional systems also lack in scalability and efficiency issues. The proposed study is to design a model that pulls and prioritizes knowledge-based content from the Hadoop distributed framework. This study also proposes the Hadoop-based pruning system and recommendation system.
Social implications
The pruning system gives an alert about the dynamic changes in the article (virtual document). The changes in the document are automatically updated in the RDF document. This helps in semantic matching and retrieval of the most relevant source with the virtual document.
Originality/value
The recommendation and detection of changes in the blogs are performed semantically using n-triples and automated data structures. User-focussed and choice-based crawling that is proposed in this system also assists the collaborative filtering. Consecutively collaborative filtering recommends the user focussed source documents. The entire clustering and retrieval system is deployed in multi-node Hadoop in the Amazon AWS environment and graphs are plotted and analyzed.
Details
Keywords
Bilal Hawashin, Shadi Alzubi, Tarek Kanan and Ayman Mansour
This paper aims to propose a new efficient semantic recommender method for Arabic content.
Abstract
Purpose
This paper aims to propose a new efficient semantic recommender method for Arabic content.
Design/methodology/approach
Three semantic similarities were proposed to be integrated with the recommender system to improve its ability to recommend based on the semantic aspect. The proposed similarities are CHI-based semantic similarity, singular value decomposition (SVD)-based semantic similarity and Arabic WordNet-based semantic similarity. These similarities were compared with the existing similarities used by recommender systems from the literature.
Findings
Experiments show that the proposed semantic method using CHI-based similarity and using SVD-based similarity are more efficient than the existing methods on Arabic text in term of accuracy and execution time.
Originality/value
Although many previous works proposed recommender system methods for English text, very few works concentrated on Arabic Text. The field of Arabic Recommender Systems is largely understudied in the literature. Aside from this, there is a vital need to consider the semantic relationships behind user preferences to improve the accuracy of the recommendations. The contributions of this work are the following. First, as many recommender methods were proposed for English text and have never been tested on Arabic text, this work compares the performance of these widely used methods on Arabic text. Second, it proposes a novel semantic recommender method for Arabic text. As this method uses semantic similarity, three novel base semantic similarities were proposed and evaluated. Third, this work would direct the attention to more studies in this understudied topic in the literature.
Details
Keywords
Hui Shi, Drew Hwang, Dazhi Chong and Gongjun Yan
Today’s in-demand skills may not be needed tomorrow. As companies are adopting a new group of technologies, they are in huge need of information technology (IT) professionals who…
Abstract
Purpose
Today’s in-demand skills may not be needed tomorrow. As companies are adopting a new group of technologies, they are in huge need of information technology (IT) professionals who can fill various IT positions with a mixture of technical and problem-solving skills. This study aims to adopt a sematic analysis approach to explore how the US Information Systems (IS) programs meet the challenges of emerging IT topics.
Design/methodology/approach
This study considers the application of a hybrid semantic analysis approach to the analysis of IS higher education programs in the USA. It proposes a semantic analysis framework and a semantic analysis algorithm to analyze and evaluate the context of the IS programs. To be more specific, the study uses digital transformation as a case study to examine the readiness of the IS programs in the USA to meet the challenges of digital transformation. First, this study developed a knowledge pool of 15 principles and 98 keywords from an extensive literature review on digital transformation. Second, this study collects 4,093 IS courses from 315 IS programs in the USA and 493,216 scientific publication records from the Web of Science Core Collection.
Findings
Using the knowledge pool and two collected data sets, the semantic analysis algorithm was implemented to compute a semantic similarity score (DxScore) between an IS course’s context and digital transformation. To present the credibility of the research results of this paper, the state ranking using the similarity scores and the state employment ranking were compared. The research results can be used by IS educators in the future in the process of updating the IS curricula. Regarding IT professionals in the industry, the results can provide insights into the training of their current/future employees.
Originality/value
This study explores the status of the IS programs in the USA by proposing a semantic analysis framework, using digital transformation as a case study to illustrate the application of the proposed semantic analysis framework, and developing a knowledge pool, a corpus and a course information collection.
Details
Keywords
Nina Preschitschek, Helen Niemann, Jens Leker and Martin G. Moehrle
The convergence of industries exposes the involved firms to various challenges. In such a setting, a firm's response time becomes key to its future success. Hence, different…
Abstract
Purpose
The convergence of industries exposes the involved firms to various challenges. In such a setting, a firm's response time becomes key to its future success. Hence, different approaches to anticipating convergence have been developed in the recent past. So far, especially IPC co-classification patent analyses have been successfully applied in different industry settings to anticipate convergence on a broader industry/technology level. Here, the aim is to develop a concept to anticipate convergence even in small samples, simultaneously providing more detailed information on its origin and direction.
Design/methodology/approach
The authors assigned 326 US-patents on phytosterols to four different technological fields and measured the semantic similarity of the patents from the different technological fields. Finally, they compared these results to those of an IPC co-classification analysis of the same patent sample.
Findings
An increasing semantic similarity of food and pharmaceutical patents and personal care and pharmaceutical patents over time could be regarded as an indicator of convergence. The IPC co-classification analyses proved to be unsuitable for finding evidence for convergence here.
Originality/value
Semantic analyses provide the opportunity to analyze convergence processes in greater detail, even if only limited data are available. However, IPC co-classification analyses are still relevant in analyzing large amounts of data. The appropriateness of the semantic similarity approach requires verification, e.g. by applying it to other convergence settings.
Details
Keywords
Rajat Kumar Mudgal, Rajdeep Niyogi, Alfredo Milani and Valentina Franzoni
The purpose of this paper is to propose and experiment a framework for analysing the tweets to find the basis of popularity of a person and extract the reasons supporting the…
Abstract
Purpose
The purpose of this paper is to propose and experiment a framework for analysing the tweets to find the basis of popularity of a person and extract the reasons supporting the popularity. Although the problem of analysing tweets to detect popular events and trends has recently attracted extensive research efforts, not much emphasis has been given to find out the reasons behind the popularity of a person based on tweets.
Design/methodology/approach
In this paper, the authors introduce a framework to find out the reasons behind the popularity of a person based on the analysis of events and the evaluation of a Web-based semantic set similarity measure applied to tweets. The methodology uses the semantic similarity measure to group similar tweets in events. Although the tweets cannot contain identical hashtags, they can refer to a unique topic with equivalent or related terminology. A special data structure maintains event information, related keywords and statistics to extract the reasons supporting popularity.
Findings
An implementation of the algorithms has been experimented on a data set of 218,490 tweets from five different countries for popularity detection and reasons extraction. The experimental results are quite encouraging and consistent in determining the reasons behind popularity. The use of Web-based semantic similarity measure is based on statistics extracted from search engines, it allows to dynamically adapt the similarity values to the variation on the correlation of words depending on current social trends.
Originality/value
To the best of the authors’ knowledge, the proposed method for finding the reason of popularity in short messages is original. The semantic set similarity presented in the paper is an original asymmetric variant of a similarity scheme developed in the context of semantic image recognition.
Details
Keywords
The purpose of this paper is to merge the ontologies that remove the redundancy and improve the storage efficiency. The count of ontologies developed in the past few eras is…
Abstract
Purpose
The purpose of this paper is to merge the ontologies that remove the redundancy and improve the storage efficiency. The count of ontologies developed in the past few eras is noticeably very high. With the availability of these ontologies, the needed information can be smoothly attained, but the presence of comparably varied ontologies nurtures the dispute of rework and merging of data. The assessment of the existing ontologies exposes the existence of the superfluous information; hence, ontology merging is the only solution. The existing ontology merging methods focus only on highly relevant classes and instances, whereas somewhat relevant classes and instances have been simply dropped. Those somewhat relevant classes and instances may also be useful or relevant to the given domain. In this paper, we propose a new method called hybrid semantic similarity measure (HSSM)-based ontology merging using formal concept analysis (FCA) and semantic similarity measure.
Design/methodology/approach
The HSSM categorizes the relevancy into three classes, namely highly relevant, moderate relevant and least relevant classes and instances. To achieve high efficiency in merging, HSSM performs both FCA part and the semantic similarity part.
Findings
The experimental results proved that the HSSM produced better results compared with existing algorithms in terms of similarity distance and time. An inconsistency check can also be done for the dissimilar classes and instances within an ontology. The output ontology will have set of highly relevant and moderate classes and instances as well as few least relevant classes and instances that will eventually lead to exhaustive ontology for the particular domain.
Practical implications
In this paper, a HSSM method is proposed and used to merge the academic social network ontologies; this is observed to be an extremely powerful methodology compared with other former studies. This HSSM approach can be applied for various domain ontologies and it may deliver a novel vision to the researchers.
Originality/value
The HSSM is not applied for merging the ontologies in any former studies up to the knowledge of authors.
Details
Keywords
Bachriah Fatwa Dhini, Abba Suganda Girsang, Unggul Utan Sufandi and Heny Kurniawati
The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes…
Abstract
Purpose
The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy.
Design/methodology/approach
The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model.
Findings
The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2.
Originality/value
This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics.
Details
Keywords
Jorge Martinez‐Gil and José F. Aldana‐Montes
Semantic similarity measures are very important in many computer‐related fields. Previous works on applications such as data integration, query expansion, tag refactoring or text…
Abstract
Purpose
Semantic similarity measures are very important in many computer‐related fields. Previous works on applications such as data integration, query expansion, tag refactoring or text clustering have used some semantic similarity measures in the past. Despite the usefulness of semantic similarity measures in these applications, the problem of measuring the similarity between two text expressions remains a key challenge. This paper aims to address this issue.
Design/methodology/approach
In this article, the authors propose an optimization environment to improve existing techniques that use the notion of co‐occurrence and the information available on the web to measure similarity between terms.
Findings
The experimental results using the Miller and Charles and Gracia and Mena benchmark datasets show that the proposed approach is able to outperform classic probabilistic web‐based algorithms by a wide margin.
Originality/value
This paper presents two main contributions. The authors propose a novel technique that beats classic probabilistic techniques for measuring semantic similarity between terms. This new technique consists of using not only a search engine for computing web page counts, but a smart combination of several popular web search engines. The approach is evaluated on the Miller and Charles and Gracia and Mena benchmark datasets and compared with existing probabilistic web extraction techniques.
Details
Keywords
Lena L. Kronemeyer, Herbert Kotzab and Martin G. Moehrle
The purpose of this paper is the development of a patent-based supplier portfolio that can be used to evaluate and select suppliers on account of their technological competencies.
Abstract
Purpose
The purpose of this paper is the development of a patent-based supplier portfolio that can be used to evaluate and select suppliers on account of their technological competencies.
Design/methodology/approach
In addition to traditional approaches, the authors develop a supplier portfolio that characterizes suppliers according to the similarity between supplier's and OEM's technological competencies as well as their technological broadness. These variables are measured on the basis of patents, which constitute a valuable source of information in technology-driven industries. Contrary to existing binary measurement approaches, the authors’ portfolio uses semantic analyses to make use of the specific information provided in the patents' texts. The authors test this method in the field of gearings, which is a key driver for the automotive industry.
Findings
The authors identify six generic positions, characterizing specific risks for an OEM to become either technologically dependent or dependent on suppliers' production capacities. For each position the authors develop specific management strategies in face of the aforementioned risks. The approach helps OEMs navigate in the competitive landscape based on the most recent and publicly available information medium.
Originality/value
This work explicitly applies the construct of technological competencies to supplier evaluation and selection on the basis of portfolio approaches. Furthermore, the authors improve the use of patents for supplier evaluation in two respects: First, the authors analyze OEMs and upstream suppliers on an organizational level. Second, the authors utilize advanced semantic analysis to generate variables for the measurement of the criteria mentioned above.
Details
Keywords
The purpose of this paper is to propose a solution for automating the task of matching business process models and search for correspondences with regard to the model semantics…
Abstract
Purpose
The purpose of this paper is to propose a solution for automating the task of matching business process models and search for correspondences with regard to the model semantics, thus improving the efficiency of such works.
Design/methodology/approach
A method is proposed based on combining several semantic technologies. The research follows a design-science-oriented approach in that a method together with its supporting artifacts has been engineered. It application allows for reusing legacy models and automatedly determining semantic similarity.
Findings
The method has been applied and the first findings suggest the effectiveness of the approach. The results of applying the method show its feasibility and significance. The suggested heuristic computing of semantic correspondences between semantically heterogeneous business process models is flexible and can support domain users.
Research limitations/implications
Even though a solution can be offered that is directly usable, so far the full complexity of the natural language as given in model element labels is not yet completely resolvable. Here further research could contribute to the potential optimizations and refinement of automatic matching and linguistic procedures. However, an open research question could be solved.
Practical implications
The method presented is aimed at adding to the methods in the field of business process management and could extend the possibilities of automating support for business analysis.
Originality/value
The suggested combination of semantic technologies is innovative and addresses the aspect of semantic heterogeneity in a holistic, which is novel to the field.