El context ho és tot: una tipologia d’anotació sensible al context per a l’avaluació de la qualitat de la traducció de diàlegs
Resum
Fins fa poc, la majoria dels sistemes de traducció automàtica (TA) traduïen les oracions de manera aïllada, i deixaven de banda un context clau a nivell de document a causa de l’escassetat de dades d’entrenament centrades en el discurs i de la manca de mètodes d’avaluació sòlids. Presentem un marc d’anotació sensible al context, validat sobre un conjunt de dades d’atenció al client amb un acord interanotador substancial (κ de Cohen = 0,73), que podria oferir un nou estàndard per a l’avaluació contextual de la TA.
Paraules clau
traducció automàtica, fenòmens discursius, context, fluxos de treball d’avaluació de la qualitat de la traducció, marc d'anotació sensible al contextReferències
Amidei, Jacopo; Piwek, Paul; Willis, Alistair (2019). Agreement is overrated: A plea for correlation to assess human evaluation reliability. In: Van Deemter, Kess; Lin, Chenghua; Takamura, Hiroya (eds,). In: van Deemter, Kess; Lin, Chenghua; Takamura, Hiroya (eds.). Proceedings of the 12th International Conference on Natural Language Generation. Association for Computational Linguistics, pp. 344–354. <https://aclanthology.org/W19-8642>. [Accessed: 20251217].
Bawden, Rachel (2018). Going beyond the sentence: Contextual machine translation of dialogue [Doctoral dissertation]. Université Paris-Saclay. Paris.
<https://tel.archives-ouvertes.fr/tel-02066998>. [Accessed: 20251217].
Birner, Betty J. (2012). Introduction to pragmatics. Hoboken, NJ: John Wiley.
Bublitz, Wolfram (2011). Cohesion and coherence. In: Zienkowski, Jan; Östman, Jan-Ola; Verschueren, Jef (eds.). Discursive Pragmatics. Handbook of Pragmatics Highlights. Amsterdam; Philadelphia: John Benjamins, pp. 37–50. <https://doi.org/10.1075/hoph.8>. [Accessed: 20251217].
Cai, Xiaoyu; Xiong, Deyi (2020). A test suite for evaluating discourse phenomena in document-level neural machine translation. In: Liu, Qun; Xiong, Deyi; Ge, Shili; Zhang, Xiaojun (eds.). Proceedings of the Second International Workshop on Discourse Processing. Association for Computational Linguistics, pp. 13–17. <10.18653/v1/2020.iwdp-1.3>. [Accessed: 20251217].
Castilho, Sheila; Doherty, Stephen; Gaspari, Federico; Moorkens, Joss (2018). Approaches to human and machine translation quality assessment. In: Moorkens, Joss; Castilho, Sheila; Gaspari, Federico; Doherty, Stephen (eds.). Translation Quality Assessment: From Principles to Practice. Cham: Springer, pp. 9–38.
Castilho, Sheila; Cavalheiro Camargo, João Luiz; Menezes, Miguel; Way, Andy (2021). DELA corpus: A document-level corpus annotated with context-related issues. In: Barrault, Loic; et al. (eds.). Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 1–12. <https://aclanthology.org/2021.wmt-1.63/>. [Accessed: 20251217].
Cohen, Jacob (1988). Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Escribe, Marie (2019). Human evaluation of neural machine translation: The case of deep learning. In: Temnikova, Irina.; Orasan, Constantin.; Corpas Pastor, Gloria.; Mitkov, Ruslan (eds.). Proceedings of the Human-Informed Translation and Interpreting Technology Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT 2019). Association for Computational Linguistics, pp. 36–46. <https://aclanthology.org/W19-8705>. [Accessed: 20251217].
Fang, Qiong (2018). A study of the impact of translation ecosystem on the translator from the perspective of restriction factors. IOP Conference Series: Materials Science and Engineering, v. 452, n. 3, 032020. <https://doi.org/10.1088/1757-899X/452/3/032020>. [Accessed: 20251217].
Farinha, Ana C.; Farajian, M. Amin; Buchicchio, Marco; Fernandes, Patrick; De Souza, José G. C.; Moniz, Helena; Martins, André F. T. (2022). Findings of the WMT 2022 shared task on chat translation. In: Koehn, Philipp; et al. (eds.). Proceedings of the Seventh Conference on Machine Translation (WMT). Association for Computational Linguistics, pp. 724–743. <https://aclanthology.org/2022.wmt-1.72>. [Accessed: 20251217].
Freitag, Markus; Rei, Ricardo; Mathur, Nitika; Lo, Chi-Kiang; Craig, Stewart; Foster, George; Bojar, Ondřej (2021). Results of the WMT21 metrics shared task: evaluating metrics with expert-based human evaluations on TED and News Domain. In: Barrault, Loic; et al. (eds.). Proceedings of the Sixth Conference on Machine Translation. Association for Computational Linguistics, pp. 733–774. <https://aclanthology.org/2021.wmt-1.74>. [Accessed: 20251217].
Garvin, David A. (1984). What does “quality” really mean? Sloan Management Review, v. 25, n. 1, pp. 25–43.
Grice, H. Paul (1991). Studies in the way of words. Cambridge, MA: Harvard University Press.
Hassan, Hany; Aue, Anthony; Chen, Chang; Chowdhary, Vishal; Clark, Jonathan; Federmann, Christian; et al. (2018). Achieving human parity on automatic Chinese-to-English news translation. ArXiv:1803.05567. <https://doi.org/10.48550/arXiv.1803.05567>. [Accessed: 20251217].
Habermas, Jürgen (1979). Communication and the evolution of society. Boston: Beacon Press.
Halliday, Michael A. K. (1989). Language, context and text. Geelong: Deakin University Press.
Horn, Laurence R.; Ward, Gregory L. (eds.) (2004). The handbook of pragmatics. Oxford: Wiley.
Horton, William S. (2012). Shared knowledge, mutual understanding and meaning negotiation. In: Hans-Jörg Schmid (ed.). Cognitive Pragmatics. Berlin; Boston: De Gruyter Mouton, pp. 375–398.
Jin, Lifeng; He, Jie; May, Jonathan; Ma, Xuezhe (2023). Challenges in context-aware neural machine translation. arXiv:2305.13751. <https://doi.org/10.48550/arXiv.2305.13751>. [Accessed: 20251217].
Jwalapuram, Prathyusha; Rychalska, Barbara; Joty, Shafiq; Basaj, Dominik (2021).
DiP benchmark tests. arXiv preprint. <https://doi.org/10.48550/arXiv.2004.14607>. [Accessed: 20251217].
Koby, Geoffrey S.; Fields, Paul; Hague, Daryl R.; Lommel, Arle; Melby, Alan (2014). Defining translation quality. Revista Tradumàtica: tecnologies de la traducció, n. 12, pp. 413–420. <https://doi.org/10.5565/rev/tradumatica.76>. [Accessed: 20251217].
Läubli, Samuel; Sennrich, Rico; Volk, Martin (2018). Has machine translation achieved human parity? A case for document-level evaluation. ArXiv:1808.07048. <https://doi.org/10.48550/arXiv.1808.07048>. [Accessed: 20251217].
Läubli, Samuel; Castilho, Sheila; Neubig, Graham; Sennrich, Rico; Shen, Qinlan; Toral, Antonio (2020). A set of recommendations for assessing human–machine parity in language translation. Journal of Artificial Intelligence Research, v. 67, pp. 653–672. <https://doi.org/10.1613/jair.1.11371>. [Accessed: 20251217].
Lommel, Arle; Uszkoreit, Hans; Burchardt, Aljoscha (2014). Multidimensional Quality Metrics (MQM): A framework for declaring and describing translation quality metrics. Revista Tradumàtica: tecnologies de la traducció, n. 12, pp. 455–463. <https://ddd.uab.cat/pub/tradumatica/tradumatica_a2014n12/tradumatica_a2014n12p455.pdf>. [Accessed: 20251217].
Lommel, Arle; Gladkoff, Serge; Melby, Alan; Wright, Sue Ellen; Strandvik, Ingegerd; Gasova, Kristyna; Nenadic, Goran (2024). The multi-range theory of translation quality measurement: MQM scoring models and statistical quality control. arXiv:2405.16969. <https://arxiv.org/abs/2405.16969>. [Accessed: 20251217].
Malinowski, Bronisław (2000). The problem of meaning in primitive languages. In: Lucy Burke; Tony Crowley; Alan Girvin (eds.). The Routledge Language and Cultural Theory Reader. London; New York Routledge, pp. 386–395. [Accessed: 20251217].
Menezes, Miguel; Farajian, M. Amin; Moniz, Helena; Varelas Graça, João (2023). A Context-Aware Annotation Framework for Customer Support Live Chat Machine Translation. In: Utiyama, Masao; Wang, Rui (eds.). Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, Macau SAR China. Asia-Pacific Association for Machine Translation, pp. 286–297. <https://aclanthology.org/2023.mtsummit-research.24/>. [Accessed: 20251217].
Müller, Mathias; Rios, Annette; Voita, Elena; Sennrich, Rico (2018). A large-scale test set for pronoun translation. ArXiv:1810.02268. <https://arxiv.org/abs/1810.02268>. [Accessed: 20251217].
Nord, Christiane (2014). Translating as a purposeful activity. London: Routledge.
O’Brien, Sharon (2023). Human-centered augmented translation. Perspectives, v. 32, n. 3), pp. 391–406. <https://doi.org/10.1080/0907676X.2023.2247423>. [Accessed: 20251217].
Papineni, Kishore; Roukos, Salim; Ward, Todd; Zhu, Wei-Jing (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. In: Pierre, Isabelle; Charniak, Eugene; Lin, Dekang (eds.). Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 311–318. <https://aclanthology.org/P02-1040>. [Accessed: 20251217].
Petrick, Fabian; Herold, Christian; Petrushkov, Pavel; Khadivi, Siavash; Ney, Hermann (2023). Document-level language models for machine translation. ArXiv:2310.12303. <https://arxiv.org/abs/2310.12303>. [Accessed: 20251217].
Post, Matt; Junczys-Dowmunt, Marcin (2023). Escaping the sentence-level paradigm. ArXiv:2304.12959. <https://doi.org/10.48550/arXiv.2304.12959>. [Accessed: 20251217].
Rei, Ricardo; Stewart, Craig; Farinha, Ana C.; Lavie, Alon (2020). COMET: A Neural Framework for Mt Evaluation. ArXiv:2009.09025. <https://arxiv.org/abs/2009.09025>. [Accessed: 20251217].
Shen, Lihong (2012). Context and text. Theory and Practice in Language Studies, v. 2, n. 12, pp. 2663–2669. <https://www.academypublication.com/issues/past/tpls/vol02/12/28.pdf>. [Accessed: 20251217].
Silverman, Hugh J. (1986). What is textuality? Part II. Phenomenology + Pedagogy, v. 4, n. 1, pp. 54–61. <https://doi.org/10.29173/pandp15010>. [Accessed: 20251217].
Stalnaker, Robert (2002). Common ground. Linguistics and Philosophy, v. 25, n. 5–6, pp. 701-721. <https://doi.org/10.1023/A:1020867916902>. [Accessed: 20251217].
Tierney, Robert J.; Mosenthal, James H. (1983). Cohesion and textual coherence. Research in the Teaching of English, v. 17, n. 3, pp. 215-229. <https://www.jstor.org/stable/40170955>. [Accessed: 20251217].
Toral, Antonio; Castilho, Sheila; Hu, Ke; Way, Andy (2018). Attaining the unattainable? Reassessing claims of human parity in neural machine translation. ArXiv:1808.10432. <https://doi.org/10.48550/arXiv.1808.10432>. [Accessed: 20251217].
Vermeer, Hans J. (1978). Ein Rahmen für eine allgemeine Translationstheorie. Heidelberg: Groos.
Vernikos, Giorgos; Thompson, Brian; Mathur, Prashant; Federico, Marcello (2022). Embarrassingly easy document-level MT metrics: How to convert any pretrained metric into a document-level metric. ArXiv:2209.13654. <https://doi.org/10.48550/arXiv.2209.13654>. [Accessed: 20251217].
Von Wright, Georg Henrik (1981). Explanation and understanding of action. Revue internationale de philosophie, v 35, n. 135, pp. 127–142. <https://www.jstor.org/stable/23945379>. [Accessed: 20251217].
Wicks, Rachel; Post, Matt (2023). Identifying context-dependent translations for Evaluation Set Production. In: Koehn, Philipp; et al. (eds.). Proceedings of the Eighth Conference on Machine Translation (WMT), December 6-7, 2023. Association for Computational Linguistics, pp. 452-467. <https://aclanthology.org/2023.wmt-1.42/>. [Accessed: 20251217].
Wittgenstein, Ludwig (1958). Philosophical investigations. Oxford: Blackwell.
Yin, Kexin; Fernandes, Patrick; Pruthi, Danish; Chaudhary, Aditi; Martins, André F. T.; Neubig, Graham (2021). Do context-aware translation models pay the right attention? ArXiv:2105.06977. <https://doi.org/10.48550/arXiv.2105.06977>. [Accessed: 20251217].
Publicades
Com citar
Descàrregues
Funding data
-
Universidade de Lisboa
Grant numbers UID/214/2025
Drets d'autor (c) 2025 Miguel Menezes, Amin Farajian, Helena Moniz, João Graça

Aquesta obra està sota una llicència internacional Creative Commons Reconeixement 4.0.