Speech recognition and intelligent translation under multimodal human–computer interaction system

Abstract: The traditional translation robot is limited to the translation of single-mode text images and text videos, which has the problem of low translation accuracy. Therefore, speech recognition and intelligent translation in multimodal human–computer interaction (HCI) system are proposed. First, the network structure of speech recognition model in multi-channel HCI system is established, and the multi-head self-attention mechanism is constructed. Then, the artificial intelligence voice wake-up function is designed, and a multimodal machine translation model is constructed. On this basis, selective attention is added to obtain visual recognition of perceived text, and the decoder is used for multimodal gating fusion to realize the output of encoder translation results. Experimental results show that this method has high BLUE value and high translation accuracy.

Standort
Deutsche Nationalbibliothek Frankfurt am Main
Umfang
Online-Ressource
Sprache
Englisch

Erschienen in
Speech recognition and intelligent translation under multimodal human–computer interaction system ; volume:33 ; number:1 ; year:2024 ; extent:14
Journal of intelligent systems ; 33, Heft 1 (2024) (gesamt 14)

Urheber
Huang, Danhua
Xiang, Shuaiqiu

DOI
10.1515/jisys-2023-0192
URN
urn:nbn:de:101:1-2409071652440.194184105618
Rechteinformation
Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
Letzte Aktualisierung
15.08.2025, 07:27 MESZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Deutsche Nationalbibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Beteiligte

  • Huang, Danhua
  • Xiang, Shuaiqiu

Ähnliche Objekte (12)