Interpreter identification in the Polish Interpreting Corpus

Authors

Abstract

This paper describes automated identification of interpreter voices in the Polish Interpreting Corpus (PINC). After collecting a set of voice samples of interpreters, a deep neural network model was used to match all the utterances from the corpus with specific individuals. The final result is very accurate and provides a considerable saving of time and accuracy off human judgment.

Keywords

automatic speaker identification, speech corpora, speaker annotation, interpreting corpora, European Parliament

References

Bergl, Vladimir; et al. (2001). Apparatus and methods for user identification to deny access or service to unauthorized users. U.S. Patent No. 6246751. 12 Jun. 2001. <https://patents.justia.com/patent/6246751>. [Accessed: 20211116].

Bernardini, S.; Ferraresi, A.; Russo, M.; Collard, C.; Defrancq, B. (2018). Building interpreting and intermodal corpora: A how-to for a formidable task. In: Russo, M.; Bendazzoli, C.; Defrancq, B. (eds.). Making way in corpus-based interpreting studies. Singapore: Springer singapore, pp. 21-42. <https://doi.org/10.1007/978-981-10-6199-8_2>. [Accessed: 20211116].

Chmiel, A. (2012). Pamięć operacyjna tłumaczy konferencyjnych mierzona metodą RSPAN. In: Piotrowska, M. (ed.). Kompetencje tłumacza. Kraków: Tertium, pp. 137-154.

Chmiel, A. (2016). Directionality and context effects in word translation tasks performed by conference interpreters. Poznan Studies in Contemporary Linguistics, v. 52, n. 2, pp. 269–295. <https://doi.org/10.1515/psicl-2016-0010>. [Accessed: 20211116].

Chmiel, A. (2018). Meaning and words in the conference interpreter’s mind: Effects of interpreter training and experience in a semantic priming study. Translation, Cognition & Behavior, v. 1, n. 1, pp. 21–41. <https://doi.org/10.1075/tcb.00002.chm>. [Accessed: 20211116].

Chmiel, A.; Kajzer-Wietrzny, M.; Koržinek, D.; Janikowski, P.; Jakubowski, D.; Polakowska, D. (2019). Fluency parameters in the Polish Interpreting Corpus (PINC). In: Kajzer-Wietrzny, M.; Bernardini, S.; Ferraresi, A.; Ivaska, I. (eds.). Empirical investigations into the forms of mediated discourse at the European Parliament: A thematic session at the 49th Pozna´n Linguistic Meeting (PLM2019). <http://wa.amu.edu.pl/~wjarek/PLM2019/PLM2019_Thematic_session_Mediated_discourse_European_Parliament.pdf>. [Accessed: 20211116].

Collard, C.; Defrancq, B. (2020). Disfluencies in simultaneous interpreting: A corpus-based study with special reference to sex. In: Defrancq, B.; Vandevoorde, L.; Daems, J. (eds.). New empirical perspectives on translation and interpreting. London: Routledge, pp. 264-299. <https://doi.org/10.4324/9780429030376-12>. [Accessed: 20211116].

Dal Fovo, E. (2018). European Union Politics Interpreted on Screen: A corpus-based investigation on the interpretation of the third 2014 EU presidential debate. In: Russo, M.; Bendazzoli, C.; Defrancq, B. (eds.). Making way in corpus-based interpreting studies. Singapore: Springer Singapore, pp. 157-184. <https://doi.org/10.1007/978-981-10-6199-8_9>. [Accessed: 20211116].

Defrancq, B.; Plevoets, K.; Magnifico, C. (2015). Connective Items in Interpreting and Translation: Where Do They Come From?. In: Romero-Trillo, J. (ed.). Yearbook of Corpus Linguistics and Pragmatics 2015. Cham: Springer, pp. 195–222. <https://doi.org/10.1007/978-3-319-17948-3_9>. [Accessed: 20211116].

Dehak, N.; et al. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, v. 19, n. 4, pp. 788-798. <https://doi.org/10.1109/TASL.2010.2064307>. [Accessed: 20211116].

Ferraresi, A.; Bernardini, S. (2019). Building EPTIC. In: Doval, I.; Sánchez Nieto, M.T. (eds.). Parallel Corpora for Contrastive and Translation Studies: New resources and applications. Amsterdam: John Benjamins. (Studies in Corpus Linguistics; 90), pp. 123-139. <https://doi.org/10.1075/scl.90.08fer>. [Accessed: 20211116].

Garcia-Romero, D.; Espy-Wilson, C. Y. (2011). Analysis of i-vector length normalization in speaker recognition systems. Conference in: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31. In: DBLP Computer Science Bibliography. <https://dblp.uni-trier.de/db/conf/interspeech/interspeech2011.html#Garcia-RomeroE11>. [Accessed: 20211116].

Kajzer-Wietrzny, M. (2012). Interpreting universals and interpreting style [PhD. Thesis]. Uniwersytet im. Adama Mickiewicza w Poznaniu, Pozna´n. Unpublished.

Kuhn, R.; et al. (1998). Eigenfaces and eigenvoices: Dimensionality reduction for specialized pattern recognition. In: 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No. 98EX175). <https://doi.org/10.1109/MMSP.1998.738915>. [Accessed: 20211116].

Magnifico, C.; Defrancq, B. (2016). Impoliteness in interpreting: A question of gender? Translation and Interpreting, v. 8, n. 2, pp. 26-45. <http://www.trans-int.org/index.php/transint/issue/view/40>. [Accessed: 20211116].

Nagrani, A.; Chung, J.S.; Zisserman, A. (2017). VoxCeleb: A Large-Scale Speaker Identification Dataset. In: Proc. Interspeech 2017, pp. 2616-2620. <https://doi.org/10.21437/Interspeech.2017-950>. [Accessed: 20211117].

Neubig, G.; Shimizu, H.; Sakti, S.; Nakamura, S.; Toda, T. (2018). The NAIST Simultaneous Translation Corpus. In: Russo, M.; Bendazzoli, C.; Defrancq, B. (eds.). Making Way in Corpus-based Interpreting Studies. Singapore: Springer Singapore, pp. 205-215. <https://doi.org/10.1007/978-981-10-6199-8_11>. [Accessed: 20211117].

Pariente, M.; Cornell, S.; Deleforge, A.; Vincent, E. (2020). Filterbank design for end-to-end speech separation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. <https://doi.org/10.1109/ICASSP40776.2020.9053038>. [Accessed: 20211117].

Povey, D.; Ghoshal, A.; Boulianne, G.; Burget, L.; Glembek, O.; Goel, N.; Hannemann, M.; Motlicek, P.; Qian, Y.; Schwarz, P.; Silovsky, J.; Stemmer, G.; Vesely, K. (2011). The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE. <https://www.danielpovey.com/files/2011_asru_kaldi.pdf>. [Accessed: 20211117].

Russo, M. (2016). Orality and Gender: A corpus-based study on lexical patterns in simultaneous interpreting. MonTI, Monografías de Traducción e Interpretación, Special Issue 3, pp. 307-322. <https://doi.org/10.6035/MonTI.2016.ne3.11>. [Accessed: 20211117].

Russo, M. (2018). Speaking Patterns and Gender in the European Parliament Interpreting Corpus: A Quantitative Study as a Premise for Qualitative Investigations. In: Russo, M.; Bendazzoli, C.; Defrancq, B. (eds.). Making Way in Corpus-based Interpreting Studies. Singapore: Springer Singapore. (New Frontiers in Translation Studies), pp. 115-131. <https://link.springer.com/book/10.1007/978-981-10-6199-8>. [Accessed: 20211117].

Sadjadi, S.O.; Greenberg, C.; Singer, E.; Reynolds, D.; Mason, L.; Hernandez-Cordero, J. (2019). The 2018 NIST Speaker Recognition Evaluation. In: Proc. Interspeech 2019, pp. 1483-1487. <https://doi.org/10.21437/Interspeech.2019-1351>. [Accessed: 20211117].

Sell, G.; Snyder, D.; McCree, A.; Garcia-Romero, D.; Villalba, J.; Maciejewski, M.; Manohar, V.; Dehak, N.; Povey, D.; Watanabe, S.; Khudanpur, S. (2018). Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge. In: Proc. Interspeech 2018, p. 2808-2812. <https://doi.org/10.21437/Interspeech.2018-1893>. [Accessed: 20211117].

Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S. (2018). X-vectors: Robust dnn embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 5329-5333. <https://doi.org/10.1109/ICASSP.2018.8461375>. [Accessed: 20211117].

Torfi, A.; Dawson, J.; Nasrabadi, N. M. (2018). Text-independent speaker verification using 3d convolutional neural networks. In: 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp. 1-6. <https://doi.org/10.1109/ICME.2018.8486441>. [Accessed: 20211117].

Turk, M. A.; Pentland, A. P. (1991). Face recognition using eigenfaces. In: Proceedings. 1991 IEEE computer society conference on computer vision and pattern recognition IEEE, pp. 586-587. <https://doi.org/10.1109/CVPR.1991.139758>. [Accessed: 20211117].

Van der Maaten, L.; Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, v. 9, pp. 2579-2605. <https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf>. [Accessed: 20211117].

Variani, E.; Lei, X.; McDermott, E.; Moreno, I. L.; Gonzalez-Dominguez, J. (2014). Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 4052-4056. <https://doi.org/10.1109/ICASSP.2014.6854363>. [Acccessed: 20211117].

Wang, B. (2012). A descriptive study of norms in interpreting: Based on the Chinese-English consecutive interpreting corpus of Chinese premier press conferences. Meta: journal des traducteurs = Meta: Translators’ Journal, v. 57, n. 1, pp. 198-212. <https://doi.org/10.7202/1012749ar>. [Accessed: 20211117].

Zhang, Y.; Yu, M.; Li, N.; Yu, C.; Cui, J.; Yu, D. (2019). Seq2seq attentional siamese neural networks for text-dependent speaker verification. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 6131-6135. <https://doi.org/10.1109/ICASSP.2019.8682676>. [Accessed: 20211117].

Published

2021-12-31

Downloads