Abba Suganda Girsang and Bima Krisna Noveta
The purpose of this study is to provide the location of natural disasters that are poured into maps by extracting Twitter data. The Twitter text is extracted by using named entity…
Abstract
Purpose
The purpose of this study is to provide the location of natural disasters that are poured into maps by extracting Twitter data. The Twitter text is extracted by using named entity recognition (NER) with six classes hierarchy location in Indonesia. Moreover, the tweet then is classified into eight classes of natural disasters using the support vector machine (SVM). Overall, the system is able to classify tweet and mapping the position of the content tweet.
Design/methodology/approach
This research builds a model to map the geolocation of tweet data using NER. This research uses six classes of NER which is based on region Indonesia. This data is then classified into eight classes of natural disasters using the SVM.
Findings
Experiment results demonstrate that the proposed NER with six special classes based on the regional level in Indonesia is able to map the location of the disaster based on data Twitter. The results also show good performance in geocoding such as match rate, match score and match type. Moreover, with SVM, this study can also classify tweet into eight classes of types of natural disasters specifically for the Indonesian region, which originate from the tweets collected.
Research limitations/implications
This study implements in Indonesia region.
Originality/value
(a)NER with six classes is used to create a location classification model with StanfordNER and ArcGIS tools. The use of six location classes is based on the Indonesia regional which has the large area. Hence, it has many levels in its regional location, such as province, district/city, sub-district, village, road and place names. (b) SVM is used to classify natural disasters. Classification of types of natural disasters is divided into eight: floods, earthquakes, landslides, tsunamis, hurricanes, forest fires, droughts and volcanic eruptions.
Details
Keywords
Bachriah Fatwa Dhini, Abba Suganda Girsang, Unggul Utan Sufandi and Heny Kurniawati
The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes…
Abstract
Purpose
The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy.
Design/methodology/approach
The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model.
Findings
The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2.
Originality/value
This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics.
Details
Keywords
William Harly and Abba Suganda Girsang
With the rise of online discussion and argument mining, methods that are able to analyze arguments become increasingly important. A recent study proposed the usage of agreement…
Abstract
Purpose
With the rise of online discussion and argument mining, methods that are able to analyze arguments become increasingly important. A recent study proposed the usage of agreement between arguments to represent both stance polarity and intensity, two important aspects in analyzing arguments. However, this study primarily focused on finetuning bidirectional encoder representations from transformer (BERT) model. The purpose of this paper is to propose convolutional neural network (CNN)-BERT architecture to improve the previous method.
Design/methodology/approach
The used CNN-BERT architecture in this paper directly uses the generated hidden representation from BERT. This allows for better use of the pretrained BERT model and makes finetuning the pretrained BERT model optional. The authors then compared the CNN-BERT architecture with the method proposed in the previous study (BERT and Siamese-BERT).
Findings
Experiment results demonstrate that the proposed CNN-BERT is able to achieve a 71.87% accuracy in measuring agreement between arguments. Compared to the previous study that achieve an accuracy of 68.58%, the CNN-BERT architecture was able to increase the accuracy by 3.29%. The CNN-BERT architecture is also able to achieve a similar result even without further pretraining the BERT model.
Originality/value
The principal originality of this paper is the proposition of using CNN-BERT to better use the pretrained BERT model for measuring agreement between arguments. The proposed method is able to improve performance and also able to achieve a similar result without further training the BERT model. This allows separation of the BERT model from the CNN classifier, which significantly reduces the model size and allows the usage of the same pretrained BERT model for other problems that also did not need to finetune their BERT model.