Analysis of translation errors and evaluation of pre-editing rules for the translation of English news texts into Spanish with Lucy LT

In this paper we study the effect of pre-editing rules on the quality of the translations produced by the MT system Lucy LT when translating English news texts into Spanish. We carried out an error annotation of the first 200 segments of the News Crawl: articles from 2014 corpus and devised a set of 8 pre-editing rules. The application of these rules to a different set of segments from the same corpus results in a reduction of the word error rate of about 11%.


Introduction
There are two main activities in the automation of the translation process: post-editing and pre-editing.On the one hand, post-editing consists of correcting the translation produced by a machine translation (MT) system, which usually contains errors, to make it adequate for the intended purpose.This activity increases, in most cases, the productivity of the translation process and its profits (Plitt & Masselot, 2010).Hence the number of companies that include or are considering to include post-editing in their translation processes is growing everyday.According to Sánchez-Martínez (2012), this rise is because of four main factors: the improvement of MT techniques, the increased availability of resources such as MT software and data, a change in the users' expectations about MT, and a better integration of MT systems in computer-assisted translation tools (e.g.DéjàVu 1 , SDL Trados 2 or OmegaT 3 ).
On the other hand, pre-editing consists of preparing the text before translation to avoid words and constructions -such as unusual grammatical constructions or the use of words between prepositions and phrasal verbs-that are prone to cause MT errors.It is therefore clear that by improving the (automatic) translatability of the source language texts, we may improve the productivity of post-editors.This may be achieved either by using a controlled language or by pre-editing the texts before their translation by means of an MT system.
Writers and pre-editors may use The Global English Style Guide that Kohl (2008) devised in order to improve the quality of the texts in two ways: to make them more translatable and to improve their comprehension by non-native speakers.Some of the rules that Kohl includes in his guide are: to keep the verb and the preposition of phrasal verbs together, to reduce as much as possible the use of the passive voice, to avoid subordinate clauses and to try to use short sentences.
In this paper we study the effect of pre-editing rules on the quality of the translations produced by the rule-based MT system Lucy LT 4 when translating English news texts into Spanish.The pre-editing rules to be applied are devised after a careful analysis of the type of errors produced by Lucy LT.To this end, we annotated the errors 5 found in the first 200 segments of the News Crawl: articles from 2014 6 corpus using the open-source software translate5 7 and the Multidimensional Quality Metrics (MQM; Lommel et al, 2014) framework, which we adapted to our needs.MQM provides a framework for defining task-specific translation metrics, and an open and expandable system for describing translation quality metrics using a shared vocabulary of issue types.
To measure the impact of the pre-editing rules we compared the translation produced by Lucy LT when the source language text is not pre-edited to the translation it produces when translating a pre-edited text.The results show that the word error rate is around 11% lower when the source language text is pre-edited following the pre-editing rules we have devised.
5 «An error represents any issue you may find with the translated text that either does not correspond to the source or is considered incorrect in the target language» (Burchardt & Lommel, 2014: 12 productivity, that is, the time a post-editor needs to correct an MT output to make it adequate for the intended purpose.She tried with and without a customised MT system -enriched with specific terminology-as well as with and without pre-edited texts.In all cases the use of a pre-edited text results in a productivity increase.Results, however, may vary because preediting and post-editing levels may change depending on the nature of the texts, the language pair and the purpose of the translation. 9Thicke's study differs from ours in the language pair, the nature of the texts to be translated and the measure used.However, both studies show a reduction in the post-editing effort when Kohl's and other pre-editing rules are applied to the source text.
The rest of the paper is organised as follows.Next section describes two frameworks for the annotation of translation errors: MQM and TAUS DQF.Section 3 then describes the methodology we have applied for the customisation of the framework we have used for error annotation and the design of pre-editing rules.The next two sections are dedicated to the quantitative and qualitative analysis of the errors found, and to the description of the preediting rules devised and their evaluation, respectively.The paper ends with some concluding remarks.

Frameworks for the annotation of errors
Before applying pre-editing rules to a source text, it is important for ensuring better results to determine the nature of the errors that the MT system produces when translating the type of texts in question, and the best way to achieve this is by performing a systematic error annotation on a sample text and a posterior quantitative and qualitative analysis.In this way, we will be able to determine the most recurrent errors and their nature, and to devise and select the most suitable pre-editing rules to reduce these errors as much as possible.
In literature, we can find two different frameworks for the annotation of errors: the Multidimensional Quality Metrics (MQM; Lommel et al, 2014) and the Dynamic Quality Framework (DQF) by TAUS (Translation Automation User Society). 10Although these two frameworks were studied separately, since June 2015 (after this study was finished) TAUS and DFKI (German Research Center for Artificial Intelligence) have harmonized their error typology evaluations as part of the European-Union-funded QT21 project.

Multidimensional Quality Metrics
Multidimensional Quality Metrics (MQM) is a framework developed for the EU-funded QTLaunchPad project by DFKI and partners.Instead of providing a translation quality metric, MQM provides a framework for defining task-specific translation metrics and an open and expandable system for describing translation quality metrics using a shared vocabulary of issue types (Lommel et al, 2014).This set of issue types is composed of 10 general categories since June 26, 2015 (accuracy, compatibility, design, fluency, internationalization, locale convention, style, terminology, verity and other) and around 100 subcategories. 11  As MQM guidelines recommend, annotators should devise their metrics according to the type of texts they are evaluating and the purpose of the evaluation.They must select only the issue types that are useful to address the relevant research questions.Thus, metrics should not contain extraneous or irrelevant categories, must be granular enough to address the research questions posed and small enough to be kept in mind by annotators (Burchardt & Lommel, 2014). 9A French news article translated into Spanish generally will require a lower level of post-editing than a Chinese legal document translated into Spanish. 10https://www.taus.net/.
Once the metric is devised, annotators should create a decision tree with all the issue types.This tree is useful to maintain the metric in mind and to distinguish between the different issue types.It also has the purpose of maintaining the consistency between annotators' own decisions and, at the same time, between the different annotators that carry out the evaluation.Finally, the annotation is done by using the free/open-source software translate5.
As regards the amount of text to be annotated, there is insufficient evidence to confirm that a certain amount of text is required for a complete analysis of errors, but, according to the QTLaunchPad results (Burchardt & Lommel, 2014), it is possible to detect more patterns and errors with a minimum of 100-150 segments.

TAUS Dynamic Quality Framework
Dynamic Quality Framework (DQF) has been developed by TAUS (Translation Automation User Society) for the human evaluation of translation quality (Görög, 2014).This quality is conceived as dynamic given that it changes depending on the characteristics of the text (i.e.content, intent, target readers, etc.) and it is achieved when the client is satisfied with the results.DQF allows for selecting the more adequate evaluation model according to the specific quality requirements by a flexible framework that is based on three parameters: utility, time and sentiment.Utility refers to the importance of the functionality of the translated content; time, as the name suggests, is the time needed for translating the text; and sentiment is the importance of the impact on the brand image, i.e. how damaging might it be to a client if the content is badly translated.DQF tools enable users to carry out an adequacy and fluency evaluation, to compare translations, to measure post-editing productivity and to score translated segments based on an error typology.
Error typology-based evaluations are done on a segment basis and errors are annotated according to five categories (accuracy, country standards, language, style and terminology).Once the error is classified, the evaluator assigns a severity level to each one (critical, major, minor and neutral) and the quality of the translation is assessed according to the quantity and severity of the errors found.
Adequacy/Fluency evaluations are also segment-level evaluations and they are recommended when time and resources are not enough to make an error typology-based evaluation.Adequacy is measured by comparing the target sentence with a reference sentence.However, if this is not available, evaluators should use the source text to determine how much of the meaning in the source text is expressed in the translation and then select an adequacy level according to a four-point scale (everything, most, little, none).12Fluency evaluation has more to do with grammar, linguistics and locale aspects of the translation and does not need a reference sentence.It is also measured in a four-point scale ( flawless, good, disfluent and incomprehensible).
For the purpose of this paper, we discarded the use of TAUS DQF and used MQM because its set of issue types -with 10 general categories (7 at the time of conducting this study) and around 100 subcategories 13 -is wider and more flexible than that of TAUS DQF.In this way, we were able to select the more suitable issue types according to the needs of our study and perform a more detailed evaluation.Moreover, the free/open-source software translate5 allows the user to perform a more complete annotation by annotating at the word level, rather than at the segment level, with the possibility of marking the errors on the translated text, to assign them a level of severity, to add comments, and to extract statistics on the annotation when it is finished.

Methodology
Before error annotation it is necessary to define the set of issue types that will be used during the analysis of errors and development of pre-editing rules.We decided to focus on linguistic aspects such as grammar and spelling to devise pre-editing rules, since we are interested in having a machine translation that is correct from both the lexical point of view and the grammatical point of view.Therefore, general categories such as verity (for those statements that contradict the world of the text), internationalisation (for problems related to the internationalisation of the content), and design (for problems related to graphic aspects, vs. linguistic aspects, of the content) were eliminated.The general categories used for error annotation are accuracy and fluency.
Regarding accuracy, we eliminated the subcategories related to terminological aspects because we did not have any termbase to be taken into account.As for fluency, we gave priority to grammatical and spelling categories, so we omitted subcategories such as content (for issues related to the presentation of the information), style guide, character encoding, non-allowed characters, pattern problem, sorting, corpus conformance, broken link/crossreference or index-TOC, since they are destined to mark issues related to text encoding, order of elements, corpus construction or external links.Once we determined the issue types relevant to our study, we developed a decision tree, which is a hierarchical categorization of the issue types, to guide the annotator during the error-annotation process.The decision tree we used is shown in Appendix A, which is based on the one described by Burchardt & Lommel (2014: 19).
For the annotation and posterior evaluation of the impact of the pre-editing rules on the quality of the translations produced by Lucy LT when translating English news texts into Spanish, we took the first 400 segments from the publicly available News Crawl: articles from 2014 corpus. 14The first 200 segments were used as development set for error annotation, analysis and development of pre-editing rules.The second 200 segments were used as test set to evaluate the impact of the pre-editing rules devised on the quality of the translations performed by the MT system.

Error analysis
The error annotation was carried out on the development set of segments and the error statistics for this set are the following:

• TOTAL: 1103
It is worth noting that during the annotation we always tried to assign the most specific issue type to an error.In those cases in which an error could not be assigned a specific subcategory, it was assigned a general category; in our case, 24 errors were assigned the (general) accuracy issue type, as it happens with other general categories.
The most recurrent errors of the MT system Lucy LT in these 200 segments regarding accuracy were chiefly mistranslations, followed by untranslated words or phrases and additions.Regarding fluency, the main issues were related to grammar, especially to agreement, tense, mood and aspect of verbs, word order and function words (articles, auxiliary verbs and prepositions).The local convention category has not been very relevant since only 7 errors have been annotated as a locale convention problem.
After this quantitative analysis, we extracted the following 15 patterns of translation errors: 1. Articles are not included in the translation between verbs and nouns in those cases in which it is necessary in Spanish.2. Past simple is always translated as an imperfect tense and it is incorrect in some cases.3. Acronyms are not always translated (some exceptions: EU=UE; U.S.= EE.UU.). 4. The second person pronoun (you) is always translated as usted.Here context would be necessary to determine if it is an issue or not. 5.The Spanish preposition a is not included in the translation between verbs and nouns.6.The verb to be is not always translated correctly; it is sometimes translated as ser when it should be translated as estar, and vice versa.7. Words composed of prefix + hyphen + noun are not translated correctly.8.The possessive remains in the translation in cases in which it is not correct in Spanish (e.g.parts of the body).9.The comma before the conjunction and remains in the translation and it is not included before but, so the punctuation of the source text remains in the target text, whereas in Spanish it is the other way round.10.Single quotation marks remain in the translation in cases that angle quotation marks should be used in Spanish.11.Capital and small letters remain in the translation in cases that they are incorrect in Spanish (e.g.Duchess=duquesa).12.There are word order errors in the translation of long noun phrases because there is more than one adjective or modifier before a noun.13.Spanish connective que is not included in the translation of completive clauses because it is not present in the source sentence.14.There are gender and number disagreements in the translation when the referent of the pronouns this, that, these and those cannot be determined.15.There are disagreements in the target language caused by the lack of elements after connectives (i.e.pronouns, auxiliary verbs) when there is a lexical ambiguity in juxtaposed sentences because the system cannot determined the subject.

Pre-editing rules and their evaluation
We devised a set of pre-editing rules to minimise as much as possible the translation errors made by Lucy LT when translating English news texts into Spanish and described in the previous section.To devise these rules we consulted the guidelines by Kohl (2008).It is important to highlight that pre-editing rules must not conflict with the grammatical rules of the source language, so it is necessary to find a set of rules that make the source text more automatically translatable, as well as grammatically correct.
The pre-editing rules to reduce the errors made by the MT system Lucy LT on the corpus News Crawl: articles from 2014 are described below: 5 of these rules have been extracted from Kohl (2008) and the other 3 have been specifically devised as a result of this study: The sentences that need to be pre-edited could be automatically detected, and even the pre-editing rules could be automatically applied, if a specific module similar to those used by transfer-based MT systems for lexical and structural transfer were developed; in particular a shallow-transfer module like those used by the Apertium free/open-source MT platform (Forcada et al., 2011) would work for most of them.However, for the experiments in this work, the sentences to be pre-edited were manually identified, and the pre-editing rules manually applied to the source text, because the development of such module falls out of the scope of this work.
The issues that are not included in the previous list must be solved by post-editing the MT output, since it was not possible to devise pre-editing rules to prevent those errors without making grammatical mistakes in the source text.
In addition to the annotation of errors, we post-edited the MT output.These post-edited translations were used as a reference translation to which the MT output is compared by computing the word error rate, i.e. the percentage of words that need to be inserted, removed or replaced to convert the translation being evaluated into the reference translation.Table 2 shows, for the development and test corpora, the word error rate of the translation performed by Lucy LT when the text to be translated has been manually pre-edited by applying the preediting rules described above, and when the text to be translated has not been pre-edited in any way.As can be seen, by applying the pre-editing rules we devised to the source text in the development corpus a word error rate reduction of 4.5 percentage points is achieved, i.e. a relative improvement of 12%.The results on the test corpus confirm the positive impact of the pre-editing rules on the translation quality: a word error rate reduction of 4.3 percentage points and a relative improvement of 11% is obtained.The fact that the improvement on the test corpus is almost equal to the improvement on the development corpus indicates that the pre-editing rules we devised generalise common transformations to be made to the input texts, and that they are not merely solving specific problems that only shows on the development corpus.
It is worth noting that, as we used as reference translation a post-edited translation of the MT output being evaluated, the reduction in the word error rate is directly related to an improvement in the productivity of post-editors, since the word error rate calculated in this way approximates the post-editing effort needed to correct the MT output.
As regards the dependence of the pre-editing rules on the target language to which the English texts are translated, most of them would also be useful when the target language of the translation is not Spanish.This is the case of rule #1, which introduces missing articles between verbs and nouns; rule #2, that expands acronyms; rule #5, which avoids the use of compounds; rule #6, that enforces the introduction of the connective that to improve the translation of completive clauses; rule #7, used to include the referent after a pronoun to transform it into a determinant; and rule #8, which repeats the subordinate connective in the coordinate clause to help the MT system determine the subject in juxtaposed sentences.

Concluding remarks
In this paper we have presented a study of the effect of pre-editing rules on the quality of the translations produced by the rule-based MT system Lucy LT when translating English news texts into Spanish.Before devising the pre-editing rules to apply to the source text we conducted a thorough analysis of the errors made by this MT system.To this end we adapted the MQM framework to the needs of our study and performed an error annotation of the translation errors made by the MT system when translating a development corpus.
We have measured the impact of the pre-editing rules on the quality of the translations produced by translating a test corpus, independent from the one used to devised the set of pre-editing rules, and comparing the translation produced to a reference translation.This reference translation was obtained by post-editing the MT output obtained when no pre-eding rules were applied.The results show a clear reduction in the number of edit operations that need to be performed to turn the MT output into the reference translation.
As a future work, it remains to be studied the effect of the pre-editing rules devised on the translations of texts from different domains.
The corpora used for the study, as well as the annotated corpus and reference translations, can be downloaded from http://www.dlsi.ua.es/~fsanchez/resources/tradumatica/corpora.zip.
The list below shows all categories used for the annotation:

Table 1 ,
provides additional data about these two set of segments.

Table 1 :
Number of segments and total amount of words in the corpora used for error analysis and development of pre-editing rules (development) and for evaluating the impact of the pre-editing rules on the translation quality(test)

Table 2 :
Word error rate on the development and test corpora both when the source text is pre-edited and when it is not.