Perspectivas sobre la traducción automática del habla

Satoshi Nakamura; Katsuhito Sudoh; Sakriani Sakti

doi:10.5565/rev/tradumatica.238

Towards Machine Speech-to-speech Translation

Authors

Satoshi Nakamura Graduate School of Science and Technology Nara Institute of Science and Technology Japan https://orcid.org/0000-0001-6956-3803
Katsuhito Sudoh Graduate School of Science and Technology Nara Institute of Science and Technology Japan https://orcid.org/0000-0002-2122-9846
Sakriani Sakti Graduate School of Science and Technology Nara Institute of Science and Technology Japan

PDF

Abstract

There has been a good deal of research on machine speech-to-speech translation (S2ST) in Japan, and this article presents these and our own recent research on automatic simultaneous speech translation. The S2ST system is basically composed of three modules: large vocabulary continuous automatic speech recognition (ASR), machine text-to-text translation (MT) and text-to-speech synthesis (TTS). All these modules need to be multilingual in nature and thus require multilingual speech and corpora for training models. S2ST performance is drastically improved by deep learning and large training corpora, but many issues still still remain such as simultaneity, paralinguistics, context and situation dependency, intention and cultural dependency. This article presents current on-going research and discusses issues with a view to next-generation speech-to-speech translation.

Keywords

Speech-to-speech translation, automatic speech recognition, machine text-to-text translation, text-to-speech synthesis

References

Chousa, K.; Sudoh, K.; Nakamura, S. (2019). Simultaneous Neural Machine Translation using Connectionist Temporal Classification. ArXiv Preprint, 1911.11933. Retrieved from http://arxiv.org/abs/1911.11933

Do, Q. T.; Sakti, S.; Nakamura, S. (2018). Sequence-to-Sequence Models for Emphasis Speech Translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, v. 26, n. 10, pp. 1873–1883. https://doi.org/10.1109/TASLP.2018.2846402

Kano, T.; Sakti, S.; Nakamura, S. (2017). Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation, in: Proceedings of Interspeech 2017, pp. 2630–2634. https://doi.org/10.21437/Interspeech.2017-944

Mizuno, A. (2016). Simultaneous Interpreting and Cognitive Constraints. Journal of College of Literature, Aoyama Gakuin University, n. 58, 1–28. https://www.agulin.aoyama.ac.jp/repo/repository/1000/19723/

Novitasari, S.; Tjandra, A.; Sakti, S.; Nakamura, S. (2019). Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition, in: Proceedings of Interspeech 2019, pp. 3835–3839. https://doi.org/10.21437/Interspeech.2019-2985

Yanagita, T.; Sakti, S.; Nakamura, S. (2019). Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework, in: Proceedings of the 10th ISCA Speech Synthesis Workshop, pp. 183–188. https://doi.org/10.21437/SSW.2019-33

Towards Machine Speech-to-speech Translation

Authors

Abstract

Keywords

References

DOI

Published

Downloads