Creació de memòries de traducció específiques de domini per a l’ajustament de paràmetres en la traducció automàtica: el corpus bilingüe de cardiologia TRENCARD
Resum
Aquest article investiga com els traductors i altres experts lingüístics poden crear memòries de traducció (MT) per tal de compilar corpus paral·lels específics d’un domini, que després poden ser utilitzats en diversos escenaris, com ara l’entrenament de la traducció automàtica i l’ajustament de paràmetres, l’optimització de les MT i/o l’ajustament de paràmetres de grans models de llenguatge. L’article presenta una metodologia semiautomàtica per a la preparació de MT, que aprofita principalment eines de traducció utilitzades per traductors, en benefici de la qualitat i el control de les dades per part dels traductors. Aquesta metodologia semiautomàtica s'utilitza per construir un corpus turc → anglès en l’àmbit de la cardiologia a partir de resums bilingües de revistes turques de cardiologia. El corpus resultant, anomenat Corpus TRENCARD, té aproximadament 800.000 paraules d'origen i 50.000 frases. Amb aquesta metodologia, els traductors poden construir les seves pròpies MT en un temps raonable i utilitzar-les en tasques que requereixin dades bilingües.
Paraules clau
Preparació de corpus bilingüe, memòria de traducció, traducció automàtica, corpus TRENCARDReferències
Archives of the Turkish Society of Cardiology. ISSN 1016-5169 | E-ISSN 1308-4488.URL: https://archivestsc.com/ [Accessed: 20241201].
Aston, Guy. (1999). Corpus Use and Learning to Translate. Textus, XII(2), 289–314.
Baker, Mona. (1993). Corpus Linguistics and Translation Studies – Implications and Applications. In Baker, Mona, Francis, Gill & Tognini-Bonelli, Elena. Text and Technology. (pp. 233–252). Amsterdam/Philadelphia: John Benjamins. https://doi.org/10.1075/z.64.15bak . [Accessed: 20241201].
Balashov, Yuri (2021). OPUS-CAT: A State-of-the-Art Neural Machine Translation Engine on Your Local Computer. The ATA Chronicle. URL: https://www.atanet.org/tools-and-technology/opus-cat-a-state-of-the-art-neural-machine-translation-engine-on-your-local-computer [Accessed: 20241201].
Bañón, Marta; Esplà-Gomis, Miquel; Forcada, Mikel L.; García-Romero, Cristian; Kuzman, Taja; Ljubešić, Nikola; van Noord, Rik; Sempere, Leopoldo Pla; Ramírez-Sánchez, G
ema; Rupnik, Peter; Suchomel, Vít; Toral, Antonio; van der Werff, Tobias; Zaragoza, Jaume. (2022). MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages. Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 303–304. https://aclanthology.org/2022.eamt-1.41 [Accessed: 20241201].
Bowker, Lynne, & Pearson, Jennifer. (2002). Working with Specialized Language: A Practical Guide to Using Corpora. London & New York: Routledge. https://doi.org/10.4324/9780203469255 [Accessed: 20241201].
do Carmo, Felix. (2020). ‘Time is money’ and the value of translation. Translation Spaces, 9(1), 35–57. https://doi.org/10.1075/ts.00020.car [Accessed: 20241201].
Dogru, Gokhan, & Moorkens, Joss. (2024). Data Augmentation with Translation Memories for Desktop Machine Translation Fine-tuning in 3 Language Pairs. The Journal of Specialised Translation, (41), 149–178. https://doi.org/10.26034/cm.jostrans.2024.4716 [Accessed: 20241201].
Esplà-Gomis, Miquel; Forcada, Mikel; Ramírez-Sánchez, Gemma; & Hoang, Hieu. (2019). ParaCrawl: Web-scale parallel corpora for the languages of the EU. Proceedings of MT Summit XVII, volume 2, (pp. 118 - 119).
ELIS (2023). European Language Industry Survey 2023. Trends, expectations and concerns of the European language industry. https://elis-survey.org/wp-content/uploads/2023/03/ELIS-2023-report.pdf [Accessed: 20241201].
Farrell, Michael. (2022). Do translators use machine translation and if so, how? Results of a survey held among professional translators. Proceedings of 44th Conference Translating and the Computer. https://doi.org/10.13140/RG.2.2.33996.69768 [Accessed: 20241201].
Fırat, Gokhan. (2021). Uberization of translation: Impacts on working conditions. The Journal of Internationalization and Localization, 8(1), 48–75. https://doi.org/10.1075/jial.20006.fir [Accessed: 20241201].
Gilbert, Devin. (2020). Using Commercially Available Customizable NMT to Study Translator Style. TT5 Translation in Transition: Human and Machine Intelligence.
Heafield, Kennet; Farrow, Elaine; van der Linde, Jelmer; Ramírez-Sánchez, Gema; Wiggins, Dion. (2022). The EuroPat Corpus: A Parallel Corpus of European Patent Data. Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 732–740). European Language Resources Association. https://aclanthology.org/2022.lrec-1.78 [Accessed: 20241201].
Koehn, Phillipp. (2005). Europarl: A Parallel Corpus for Statistical Machine Translation. Conference Proceedings: the Tenth Machine Translation Summit, (pp. 79-86). Phuket. Retrieved from http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf [Accessed: 20241201].
Koehn, Phillipp & Schroeder, J. (2007). Experiments in Domain Adaptation for Statistical Machine Translation. In C. Callison-Burch, P. Koehn, C. S. Fordyce, & C. Monz (Eds.), Proceedings of the Second Workshop on Statistical Machine Translation (pp. 224–227). Association for Computational Linguistics. https://doi.org/10.3115/1626355.1626388 [Accessed: 20241201].
Kraif, Olivier. (2002). Translation Alignment and Lexical Correspondence. Altenberg, Bengt and Granger, Sylviane (Eds). Lexis in Contrast. Corpus-based approach (pp. 271-290). Amsterdam & Philadelphia: John Benjamins. https://doi.org/10.1075/scl.7.19kra [Accessed: 20241201].
Läubli, Samuel; Amrhein, Chantal; Düggelin, Patrick; Gonzalez, Beatriz; Zwahlen, Alena; Volk, Martin (2019). Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain. Proceedings of Machine Translation Summit XVII: Research Track (pp. 267–272). European Association for Machine Translation. https://aclanthology.org/W19-6626 [Accessed: 20241201].
Le Bruyn, Bert; Fuchs, Martin; van der Klis, Martijn; Liu, Jianan; Mo, Chou; Tellings, Jos; de Swart, Henriette (2022). Parallel Corpus Research and Target Language Representativeness: The Contrastive, Typological, and Translation Mining Traditions. Languages, 7(3), Article 3. https://doi.org/10.3390/languages7030176 [Accessed: 20241201].
Marco, J., & von Lawick, H. (2009). Using corpora and retrieval software as a source of materials for the translation classroom. In Beeby, Allison; Rodríguez Inés, Patricia & Sánchez-Gijón, Pilar (Eds). Corpus Use and Translating. (pp. 9-28). Amsterdam & Philadelphia: John Benjamins.
Melby, Alan. K., & Wrigh, Sue, Ellen (2015). Translation Memory. In S.-W. Chan, Routledge Encyclopedia of Translation Technology (pp. 662-667). Routledge.
Mikhailov, Mikhail. (2022). Text corpora, professional translators and translator training. The Interpreter and Translator Trainer, 224-246. https://doi.org/10.1080/1750399X.2021.2001955 [Accessed: 20241201].
Moorkens, Joss. (2017). Under pressure: Translation in times of austerity. Perspectives, 25, 464–477. https://doi.org/10.1080/0907676X.2017.1285331 [Accessed: 20241201].
Moorkens, Joss. (2022). Ethics and machine translation. In Dorothy Kenny(ed.), Machine translation for everyone: Empowering users in the age of artificial intelligence (pp. 121–140). Language Science Press. https://doi.org/10.5281/zenodo.6759984 [Accessed: 20241201].
Moorkens, Joss., & Lewis, Dave. (2019a). Copyright and the reuse of translation as data. In M. O’Hagan (Ed.), In: O’Hagan, Minako, (ed.) The Routledge Handbook of Translation and Technology. Routledge Translation Handbooks (pp. 469–481). Routledge. http://dx.doi.org/10.4324/9781315311258-28 [Accessed: 20241201].
Moorkens, Joss., & Lewis, Dave. (2019b). Research Questions and a Proposal for the Future Governance of Translation Data. The Journal of Specialised Translation, 2–25. https://doi.org/10.4324/9781315311258-28 [Accessed: 20241201].
Moslem, Yasmen., Haque, Rajwanul., Kelleher, John. D., and Way, Andy. (2023). Adaptive Machine Translation with Large Language Models. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 227–237, Tampere, Finland. European Association for Machine Translation.
Nieminen, Tommi. (2021). OPUS-CAT: Desktop NMT with CAT Integration and Local Fine-tuning. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, (pp. 288–294). https://doi.org/10.18653/v1/2021.eacl-demos.34 [Accessed: 20241201].
Nimdzi. (2023). Nimdzi Language Technology Atlas: The Definitive Guide to the Language Technology Landscape. URL: https://www.nimdzi.com/language-technology-atlas/ [Accessed: 20241201].
O’Brien, Sharon. (2012). Translation as human–computer interaction. Translation Spaces, 1(1), 101–122. https://doi.org/10.1075/ts.1.05obr [Accessed: 20241201].
Ramírez-Sánchez, Gemma. (2022). Custom machine translation. In Kenny, Dorothy (Ed.), Machine translation for everyone: Empowering users in the age of artificial intelligence (pp. 165-186). Berlin: Language Science Press.. https://doi.org/10 . 5281/zenodo.6760022 [Accessed: 20241201].
Rothwell, Andrew, & Svoboda, Tomas. (2019). Tracking translator training in tools and technologies: Findings of the EMT survey 2017. The Journal of Specialised Translation, 2019.
Pérez-Ortiz, Juan Antonio; Forcada, Mikel.; Sánchez-Martínez, Felipe (2022). How neural machine translation works. In Kenny Dorothy, Machine translation for everyone: Empowering users in the age of artificial intelligence (pp. 141-164). Dublin: Language Science Press.
Sánchez-Gijón, Pilar. (2009). Developing Documentation Skills to Build Do-It-Yourself Corpora in the Specialized Translation Course. In Beeby, Allison; Rodríguez Inés, Patricia & Sánchez-Gijón, Pilar (Eds). Corpus Use and Translating (pp. 109-127). Amsterdam & Philadelphia: John Benjamins. https://doi.org/10.1075/btl.82.08san [Accessed: 20241201].
Tiedemann, Jörg & Nygaard, Lars. (2004). The OPUS Corpus - Parallel and Free: Http://logos.uio.no/opus. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/pdf/320.pdf [Accessed: 20241201].
Tiedemann, Jörg, & Thottingal, Santhosh. (2020). OPUS-MT – Building open translation services for the World. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (pp. 479-480). Lisboa: European Association for Machine Translation.
Turkish Journal of Cardiovascular Nursing. e-ISSN 2149-4975. https://khd.tkd.org.tr/ [Accessed: 20241201].
Turkiye Klinikleri Journal of Cardiology Journal Identity. 1988 – 2005. https://www.turkiyeklinikleri.com/journal/journal-of-cardiology/42/identity/en-index.html [Accessed: 20241201].
Turkish Journal of Thoracic and Cardiovascular Surgery. e-ISSN: 2149-8156. ISSN: 1301-5680. https://tgkdc.dergisi.org/index.php [Accessed: 20241201].
Zanettin, Federico. (2012). Translation-Driven Corpora. New York: Routledge.
Publicades
Descàrregues
Drets d'autor (c) 2024 Gokhan Dogru

Aquesta obra està sota una llicència internacional Creative Commons Reconeixement 4.0.