Protocol de cooperació persona-ordinador per a l'anotació de l'edició en traduccions automàtiques

Autors/ores

  • Felipe de Almeida Costa Universitat General de Minas Gerais
  • Adriana S Pagano Universitat Federal de Minas Gerais
  • Thiago Castro Ferreira Universitat Federal de Minas Gerais
  • Wagner Meira, Jr. Universitat Federal de Minas Gerais

Resum

Presentem un estudi que explora la detecció automàtica d'errors en un corpus de postedició amb un mètode inèdit per calcular tipus d'edició. Examinem la seva associació amb les puntuacions de qualitat assignades a la producció de traducció automàtica i als textos posteditats. Finalment, expliquem les deficiències del nostre mètode i assenyalem els tipus d'edició que val la pena aprofitar.

Paraules clau

traducció automàtica, postedició humana, anàlisi automàtica d'errors, cooperació persona-ordinador

Referències

Aziz, Wilker; Lucia Specia (2011). Fully automatic compilation of Portuguese-English and Portuguese-Spanish parallel corpora. In: Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology: Cuiabá, MT, Brazil, October 24-26, pp. 234-238. <https://aclanthology.org/W11-4533.pdf>. [Accessed: 20211207].

Aziz, Wilker; Castilho, Sheila; Specia, Lucia. (2012). PET: a Tool for Post-editing and Assessing Machine Translation. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC12). European Language Resources Association (ELRA), pp. 3982-3987. <http://www.lrec-conf.org/proceedings/lrec2012/pdf/985_Paper.pdf>. [Accessed: 20211207].

Caseli, Helena; Marcio, Inácio (2020). NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic Translations. In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020): Marseille, 11-16 May. European Language Resources Association (ELRA), pp. 3623-3629. <https://aclanthology.org/2020.lrec-1.446.pdf>. [Accessed: 20211207].

Chatterjee, Rajen; Federmann, Christian; Negri, Matteo; Turchi, Marco (2019). Findings of the WMT 2019 Shared Task on Automatic Post-Editing. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2): Florence, Italy, August. Association for Computational Linguistics, pp. 11–28. <https://dx.doi.org/10.18653/v1/W19-5402>. <https://aclanthology.org/W19-5402.pdf>. [Accessed: 20211207].

Costa, Felipe; Ferreira, Thiago; Pagano, Adriana; Meira, Wagner (2020). Building The First English-Brazilian Portuguese Corpus for Automatic Post-Editing. In: Proceedings of the 28th International Conference on Computational Linguistics: Barcelona, Spain (Online), December 8-13. International Committee on Computational Linguistics, pp. 6063–6069. <https://dx.doi.org/10.18653/v1/2020.coling-main.533>, <https://aclanthology.org/2020.coling-main.533.pdf>. [Accessed: 20211207].

Costa, Felipe; Ferreira, Thiago; Pagano, Adriana; Meira, Wagner. (2022, in press). Exploring Semantic Annotations to Measure Post-Editing Quality. In: Ji, Meng; Oakes, Michael P. (ed.). Corpus Exploration of Lexis and Discourse in Translation. London: Routledge.

De Almeida, Giselle. (2013). Translating the post-editor: an investigation of post-editing changes and correlations with professional experience across two Romance languages [PhD thesis]. School of Applied Language and Intercultural Studies, Dublin City University. <https://doras.dcu.ie/17732/>. [Accessed: 20211207].

Gardent, Claire; Shimorina, Anastasia; Narayan, Shashi; Perez-Beltrachini, Laura (2017). Creating Training Corpora for NLG Micro-Planning. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Vancouver, Canada, July 30-August 4 (Volume 1: Longs Papers). Association for Computational Linguistics, pp. 179-188. <https://dx.doi.org/10.18653/v1/P17-1017>, <https://aclanthology.org/P17-1017.pdf>. [Accessed: 20211207].

Gardent, Claire; Shimorina, Anastasia; Narayan, Shashi; Perez-Beltrachini, Laura (2017). The WebNLG Challenge: Generating Text from RDF Data. In: Proceedings of the 10th International Conference on Natural Language Generation: Santiago de Compostela, Spain, September 4-7. Association for Computational Linguistics, pp. 124-133. <https://dx.doi.org/10.18653/v1/W17-3518>, <https://aclanthology.org/W17-3518.pdf>. [Accessed: 20211207].

Gusfield, Dan (1997). Preface (Abridged) of Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Acm Sigact News, v. 28, n. 4, pp. 41-60.

Halliday, M.A.K. (aut.); Matthiessen, Christian M.I.M. (revised) (2014). Halliday's Introduction to Functional Grammar. 4th ed. Milton Park [etc.]: Routledge.

Läubli, Samuel; Sennrich, Rico; Volk, Martin (2018). Has Machine Translation Achieved Human Parity? A case for Document-level Evaluation [Preprint]. <https://arxiv.org/abs/1808.07048v1>. [Accessed: 20211207].

Levenshtein, Vladimir I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, v. 10, n. 8 (February), pp. 707-710. <https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf>. [Accessed: 20211207].

Popović, Maja; Ney, Hermann (2011). Towards Automatic Error Analysis of Machine Translation Output. Computational Linguistics, v. 37, n. 4 (December), pp. 657–688. <https://dx.doi.org/10.1162/COLI_a_00072>. [Accessed: 20211207].

Popović, Maja; Lommel, Arle; Burchardt, Aljoscha; Avramidis, Eleftherios; Uszkoreit, Hans. (2014). Relations between different types of post-editing operations, cognitive effort and temporal effort. In: Proceedings of the 17th Annual conference of the European Association for Machine Translation: Dubrovnik,Croatia, June 16-18. European Association for Machine Translation, pp. 191-198. <https://aclanthology.org/2014.eamt-1.41>, <https://aclanthology.org/2014.eamt-1.41.pdf>. [Accessed: 20211207].

Popović, Maja. (2018). Error Classification and Analysis for Machine Translation Quality Assessment. In: Moorkens, J.; et al. (eds.). Translation Quality Assessment. Cham: Springer International. (Machine Translation: Technologies and Applications; 1), pp. 129-158. <https://doi.org/10.1007/978-3-319-91241-7_7>. [Accessed: 20211207].

Popović, Maja. (2011). Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output. The Prague Bulletin of Mathematical Linguistics, n. 96 (October), pp. 59–67. <https://doi.org/10.2478/v10108-011-0011-4>, <https://www.readcube.com/articles/10.2478%2Fv10108-011-0011-4>. [Accessed: 20211207].

Snover, Matthew; Dorr, Bonnie; Schwartz, Richard; Micciulla, Linnea; Makhoul, John (2006). A Study of Translation Edit Rate with Targeted Human Annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers: Cambridge, August 8-12. The Association for Machine Translation in the Americas, pp. 223-231. <https://aclanthology.org/2006.amta-papers.25.pdf>. [Accessed: 20211207].

Snover, Matthew; Madnani, Nitin; Dorr, Bonnie J.; Schwartz, Richard (2009). Fluency, Adequacy or HTER? Exploring Different Human Judgments with a Tunable MT Metric. In: Proceedings of the Fourth Workshop on Statistical Machine Translation: Athens, Greece, 30-31 March. Association for Computational Linguistics, pp. 259-268. <https://aclanthology.org/W09-0441.pdf>. [Accessed: 20211207].

Toral, Antonio; Castilho, Sheila; Hu, Ke; Way, Andy (2018). Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation [Preprint]. . [Accessed: 20211207].

Turney, Peter D. (2001) Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: De Raedt, L.; Flach, P. (eds.). Machine Learning: ECML 2001. Berlin [etc.]: Springer. (Lecture Notes in Computer Science; 2167), pp. 491-502. <https://doi.org/10.1007/3-540-44795-4_42>. [Accessed: 20211207].

Publicades

2021-12-31

Descàrregues