La identificació d’intèrprets al Corpus d'Intèrprets Polonès
Resum
Aquest article descriu la identificació automatitzada de veus d'intèrprets al Corpus d'Intèrprets Polonès (Polish Interpreting Corpus, PINC). Després de recollir un conjunt de mostres de veu de diversos intèrprets, s’ha utilitzat un model de xarxa neuronal profunda per fer coincidir les mostres de parla del corpus amb les de cada individu. El resultat final és molt precís i proporciona un estalvi considerable de temps i de precisió en la interpretació humana.
Paraules clau
identificació automàtica de veus, corpus de parla, transcripció de veus, corpus d'intèrprets, Parlament EuropeuReferències
Bergl, Vladimir; et al. (2001). Apparatus and methods for user identification to deny access or service to unauthorized users. U.S. Patent No. 6246751. 12 Jun. 2001. <https://patents.justia.com/patent/6246751>. [Accessed: 20211116].
Bernardini, S.; Ferraresi, A.; Russo, M.; Collard, C.; Defrancq, B. (2018). Building interpreting and intermodal corpora: A how-to for a formidable task. In: Russo, M.; Bendazzoli, C.; Defrancq, B. (eds.). Making way in corpus-based interpreting studies. Singapore: Springer singapore, pp. 21-42. <https://doi.org/10.1007/978-981-10-6199-8_2>. [Accessed: 20211116].
Chmiel, A. (2012). Pamięć operacyjna tłumaczy konferencyjnych mierzona metodą RSPAN. In: Piotrowska, M. (ed.). Kompetencje tłumacza. Kraków: Tertium, pp. 137-154.
Chmiel, A. (2016). Directionality and context effects in word translation tasks performed by conference interpreters. Poznan Studies in Contemporary Linguistics, v. 52, n. 2, pp. 269–295. <https://doi.org/10.1515/psicl-2016-0010>. [Accessed: 20211116].
Chmiel, A. (2018). Meaning and words in the conference interpreter’s mind: Effects of interpreter training and experience in a semantic priming study. Translation, Cognition & Behavior, v. 1, n. 1, pp. 21–41. <https://doi.org/10.1075/tcb.00002.chm>. [Accessed: 20211116].
Chmiel, A.; Kajzer-Wietrzny, M.; Koržinek, D.; Janikowski, P.; Jakubowski, D.; Polakowska, D. (2019). Fluency parameters in the Polish Interpreting Corpus (PINC). In: Kajzer-Wietrzny, M.; Bernardini, S.; Ferraresi, A.; Ivaska, I. (eds.). Empirical investigations into the forms of mediated discourse at the European Parliament: A thematic session at the 49th Pozna´n Linguistic Meeting (PLM2019). <http://wa.amu.edu.pl/~wjarek/PLM2019/PLM2019_Thematic_session_Mediated_discourse_European_Parliament.pdf>. [Accessed: 20211116].
Collard, C.; Defrancq, B. (2020). Disfluencies in simultaneous interpreting: A corpus-based study with special reference to sex. In: Defrancq, B.; Vandevoorde, L.; Daems, J. (eds.). New empirical perspectives on translation and interpreting. London: Routledge, pp. 264-299. <https://doi.org/10.4324/9780429030376-12>. [Accessed: 20211116].
Dal Fovo, E. (2018). European Union Politics Interpreted on Screen: A corpus-based investigation on the interpretation of the third 2014 EU presidential debate. In: Russo, M.; Bendazzoli, C.; Defrancq, B. (eds.). Making way in corpus-based interpreting studies. Singapore: Springer Singapore, pp. 157-184. <https://doi.org/10.1007/978-981-10-6199-8_9>. [Accessed: 20211116].
Defrancq, B.; Plevoets, K.; Magnifico, C. (2015). Connective Items in Interpreting and Translation: Where Do They Come From?. In: Romero-Trillo, J. (ed.). Yearbook of Corpus Linguistics and Pragmatics 2015. Cham: Springer, pp. 195–222. <https://doi.org/10.1007/978-3-319-17948-3_9>. [Accessed: 20211116].
Dehak, N.; et al. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, v. 19, n. 4, pp. 788-798. <https://doi.org/10.1109/TASL.2010.2064307>. [Accessed: 20211116].
Ferraresi, A.; Bernardini, S. (2019). Building EPTIC. In: Doval, I.; Sánchez Nieto, M.T. (eds.). Parallel Corpora for Contrastive and Translation Studies: New resources and applications. Amsterdam: John Benjamins. (Studies in Corpus Linguistics; 90), pp. 123-139. <https://doi.org/10.1075/scl.90.08fer>. [Accessed: 20211116].
Garcia-Romero, D.; Espy-Wilson, C. Y. (2011). Analysis of i-vector length normalization in speaker recognition systems. Conference in: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31. In: DBLP Computer Science Bibliography. <https://dblp.uni-trier.de/db/conf/interspeech/interspeech2011.html#Garcia-RomeroE11>. [Accessed: 20211116].
Kajzer-Wietrzny, M. (2012). Interpreting universals and interpreting style [PhD. Thesis]. Uniwersytet im. Adama Mickiewicza w Poznaniu, Pozna´n. Unpublished.
Kuhn, R.; et al. (1998). Eigenfaces and eigenvoices: Dimensionality reduction for specialized pattern recognition. In: 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No. 98EX175). <https://doi.org/10.1109/MMSP.1998.738915>. [Accessed: 20211116].
Magnifico, C.; Defrancq, B. (2016). Impoliteness in interpreting: A question of gender? Translation and Interpreting, v. 8, n. 2, pp. 26-45. <http://www.trans-int.org/index.php/transint/issue/view/40>. [Accessed: 20211116].
Nagrani, A.; Chung, J.S.; Zisserman, A. (2017). VoxCeleb: A Large-Scale Speaker Identification Dataset. In: Proc. Interspeech 2017, pp. 2616-2620. <https://doi.org/10.21437/Interspeech.2017-950>. [Accessed: 20211117].
Neubig, G.; Shimizu, H.; Sakti, S.; Nakamura, S.; Toda, T. (2018). The NAIST Simultaneous Translation Corpus. In: Russo, M.; Bendazzoli, C.; Defrancq, B. (eds.). Making Way in Corpus-based Interpreting Studies. Singapore: Springer Singapore, pp. 205-215. <https://doi.org/10.1007/978-981-10-6199-8_11>. [Accessed: 20211117].
Pariente, M.; Cornell, S.; Deleforge, A.; Vincent, E. (2020). Filterbank design for end-to-end speech separation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. <https://doi.org/10.1109/ICASSP40776.2020.9053038>. [Accessed: 20211117].
Povey, D.; Ghoshal, A.; Boulianne, G.; Burget, L.; Glembek, O.; Goel, N.; Hannemann, M.; Motlicek, P.; Qian, Y.; Schwarz, P.; Silovsky, J.; Stemmer, G.; Vesely, K. (2011). The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE. <https://www.danielpovey.com/files/2011_asru_kaldi.pdf>. [Accessed: 20211117].
Russo, M. (2016). Orality and Gender: A corpus-based study on lexical patterns in simultaneous interpreting. MonTI, Monografías de Traducción e Interpretación, Special Issue 3, pp. 307-322. <https://doi.org/10.6035/MonTI.2016.ne3.11>. [Accessed: 20211117].
Russo, M. (2018). Speaking Patterns and Gender in the European Parliament Interpreting Corpus: A Quantitative Study as a Premise for Qualitative Investigations. In: Russo, M.; Bendazzoli, C.; Defrancq, B. (eds.). Making Way in Corpus-based Interpreting Studies. Singapore: Springer Singapore. (New Frontiers in Translation Studies), pp. 115-131. <https://link.springer.com/book/10.1007/978-981-10-6199-8>. [Accessed: 20211117].
Sadjadi, S.O.; Greenberg, C.; Singer, E.; Reynolds, D.; Mason, L.; Hernandez-Cordero, J. (2019). The 2018 NIST Speaker Recognition Evaluation. In: Proc. Interspeech 2019, pp. 1483-1487. <https://doi.org/10.21437/Interspeech.2019-1351>. [Accessed: 20211117].
Sell, G.; Snyder, D.; McCree, A.; Garcia-Romero, D.; Villalba, J.; Maciejewski, M.; Manohar, V.; Dehak, N.; Povey, D.; Watanabe, S.; Khudanpur, S. (2018). Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge. In: Proc. Interspeech 2018, p. 2808-2812. <https://doi.org/10.21437/Interspeech.2018-1893>. [Accessed: 20211117].
Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S. (2018). X-vectors: Robust dnn embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 5329-5333. <https://doi.org/10.1109/ICASSP.2018.8461375>. [Accessed: 20211117].
Torfi, A.; Dawson, J.; Nasrabadi, N. M. (2018). Text-independent speaker verification using 3d convolutional neural networks. In: 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp. 1-6. <https://doi.org/10.1109/ICME.2018.8486441>. [Accessed: 20211117].
Turk, M. A.; Pentland, A. P. (1991). Face recognition using eigenfaces. In: Proceedings. 1991 IEEE computer society conference on computer vision and pattern recognition IEEE, pp. 586-587. <https://doi.org/10.1109/CVPR.1991.139758>. [Accessed: 20211117].
Van der Maaten, L.; Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, v. 9, pp. 2579-2605. <https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf>. [Accessed: 20211117].
Variani, E.; Lei, X.; McDermott, E.; Moreno, I. L.; Gonzalez-Dominguez, J. (2014). Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 4052-4056. <https://doi.org/10.1109/ICASSP.2014.6854363>. [Acccessed: 20211117].
Wang, B. (2012). A descriptive study of norms in interpreting: Based on the Chinese-English consecutive interpreting corpus of Chinese premier press conferences. Meta: journal des traducteurs = Meta: Translators’ Journal, v. 57, n. 1, pp. 198-212. <https://doi.org/10.7202/1012749ar>. [Accessed: 20211117].
Zhang, Y.; Yu, M.; Li, N.; Yu, C.; Cui, J.; Yu, D. (2019). Seq2seq attentional siamese neural networks for text-dependent speaker verification. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 6131-6135. <https://doi.org/10.1109/ICASSP.2019.8682676>. [Accessed: 20211117].
Publicades
Descàrregues
Drets d'autor (c) 2021 Danijel Koržinek, Agnieszka Chmiel
Aquesta obra està sota una llicència internacional Creative Commons Reconeixement 4.0.