To read this content please select one of the options below:

Cross-lingual speaker transfer for Cambodian based on feature disentangler and time-frequency attention adaptive normalization

Yuanzhang Yang (Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China and Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, China)
Linqin Wang (Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China and Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, China)
Shengxiang Gao (Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China; Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, China and Yunnan Key Laboratory of Media Convergence, Kunming, China)
Zhengtao Yu (Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China; Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, China and Yunnan Key Laboratory of Media Convergence, Kunming, China)
Ling Dong (Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China; Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, China and Yunnan Key Laboratory of Media Convergence, Kunming, China)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 26 January 2024

Issue publication date: 23 February 2024

75

Abstract

Purpose

This paper aims to disentangle Chinese-English-rich resources linguistic and speaker timbre features, achieving cross-lingual speaker transfer for Cambodian.

Design/methodology/approach

This study introduces a novel approach: the construction of a cross-lingual feature disentangler coupled with the integration of time-frequency attention adaptive normalization to proficiently convert Cambodian speaker timbre into Chinese-English without altering the underlying Cambodian speech content.

Findings

Considering the limited availability of multi-speaker corpora in Cambodia, conventional methods have demonstrated subpar performance in Cambodian speaker voice transfer.

Originality/value

The originality of this study lies in the effectiveness of the disentanglement process and precise control over speaker timbre feature transfer.

Keywords

Acknowledgements

Since submission of this article, the following authors have updated their affiliations: Wang, Shengxiang Gao, Zhengtao Yu and Ling Dong are also affiliated with at the Yunnan Key Laboratory of Media Convergence, Kunming, China.

Funding: National Natural Science Foundation of China (Grant Nos. 62376111, U23A20388, 61972186, U21B2027); Yunnan Provincial Key Research and Development Plan (Grant Nos. 202303AP140008;202103AA080015); Talents and Platform Program of Science and Technology of Yunnan (Grant No. 202105AC160018); Yunnan high-tech industry development project (Grant No. 202001AS070014); and Open Project of Yunnan Provincial Key Laboratory of Integrated Media (Grant No. 220225702).

Citation

Yang, Y., Wang, L., Gao, S., Yu, Z. and Dong, L. (2024), "Cross-lingual speaker transfer for Cambodian based on feature disentangler and time-frequency attention adaptive normalization", International Journal of Web Information Systems, Vol. 20 No. 2, pp. 113-128. https://doi.org/10.1108/IJWIS-09-2023-0162

Publisher

:

Emerald Publishing Limited

Copyright © 2023, Emerald Publishing Limited

Related articles