Rongen Yan, Depeng Dang, Hu Gao, Yan Wu and Wenhui Yu
Question answering (QA) answers the questions asked by people in the form of natural language. In the QA, due to the subjectivity of users, the questions they query have different…
Abstract
Purpose
Question answering (QA) answers the questions asked by people in the form of natural language. In the QA, due to the subjectivity of users, the questions they query have different expressions, which increases the difficulty of text retrieval. Therefore, the purpose of this paper is to explore new query rewriting method for QA that integrates multiple related questions (RQs) to form an optimal question. Moreover, it is important to generate a new dataset of the original query (OQ) with multiple RQs.
Design/methodology/approach
This study collects a new dataset SQuAD_extend by crawling the QA community and uses word-graph to model the collected OQs. Next, Beam search finds the best path to get the best question. To deeply represent the features of the question, pretrained model BERT is used to model sentences.
Findings
The experimental results show three outstanding findings. (1) The quality of the answers is better after adding the RQs of the OQs. (2) The word-graph that is used to model the problem and choose the optimal path is conducive to finding the best question. (3) Finally, BERT can deeply characterize the semantics of the exact problem.
Originality/value
The proposed method can use word-graph to construct multiple questions and select the optimal path for rewriting the question, and the quality of answers is better than the baseline. In practice, the research results can help guide users to clarify their query intentions and finally achieve the best answer.
Details
Keywords
Previous knowledge base question answering (KBQA) models only consider the monolingual scenario and cannot be directly extended to the cross-lingual scenario, in which the…
Abstract
Purpose
Previous knowledge base question answering (KBQA) models only consider the monolingual scenario and cannot be directly extended to the cross-lingual scenario, in which the language of questions and that of knowledge base (KB) are different. Although a machine translation (MT) model can bridge the gap through translating questions to the language of KB, the noises of translated questions could accumulate and further sharply impair the final performance. Therefore, the authors propose a method to improve the robustness of KBQA models in the cross-lingual scenario.
Design/methodology/approach
The authors propose a knowledge distillation-based robustness enhancement (KDRE) method. Specifically, first a monolingual model (teacher) is trained by ground truth (GT) data. Then to imitate the practical noises, a noise-generating model is designed to inject two types of noise into questions: general noise and translation-aware noise. Finally, the noisy questions are input into the student model. Meanwhile, the student model is jointly trained by GT data and distilled data, which are derived from the teacher when feeding GT questions.
Findings
The experimental results demonstrate that KDRE can improve the performance of models in the cross-lingual scenario. The performance of each module in KBQA model is improved by KDRE. The knowledge distillation (KD) and noise-generating model in the method can complementarily boost the robustness of models.
Originality/value
The authors first extend KBQA models from monolingual to cross-lingual scenario. Also, the authors first implement KD for KBQA to develop robust cross-lingual models.