Document Type : Original papers
Author
Badr University in Cairo
Abstract
Keywords
Main Subjects
The progressively increasing shift from translation to Machine Translation Post-Editing MTPE projects has had a great impact on the market. This shift boosted organizations to try to improve efficiency and reach cost- effectiveness by editing the MT output to create a fluent output; this procedure is known as post-editing (PE). Hence, the translation market has been witnessing a shift from translation to MTPE projects which led to the emergence of new standards for the translation services that defined Post-Editing (PE) as a professional activity. For example, ISO 17100:2015 and ISO 18587:2017 set the requirements for postediting of machine translation output and establish the requirements for this new job description (Ginovart & Oliver, 2020).
Due to the diversity of the translation market and the available possibilities to transfer languages, translators must cope with these changes to provide more services. In addition, the demand for high-quality translation started to decline; meanwhile, increasing the quality of machine translation output increases along with the wide availability of computer-aided translation (CAT) tools. This practical prospect of using Computer -Assisted Translation tools CAT within the translation workflow and how they are integrated (Pym, 2013) and integrating PE as part and parcel of translation training has been raised recently (Bernardini et al.,2020). Also, there are indicators that PE may make the translation memory as the primary production process in the language industry (Translation Automation User Society [TAUS], 2010).
However, the need of the current translation market increases and has not yet met by training (Cid-Leal et al. 2019; Jia et al. 2019). As translation trainees need to learn how to handle new technologies and to make educated decisions regarding translation; for instance: deciding which translation tool to use and when. There are hesitant steps toward providing PE as a priority in translation training. Although post-editing is necessary as an attempt to cope with the changing nature of translation market, trainees have many challenges to overcome, as they need clear guidance to identify the scope of MT post-editing and what is expected from them. Necessary skills to undertake post-editing have been defined in literature, yet boosting students’ skills need more attention in research (Pym, 2013; Rozmyslowicz; 2014; Yamada, 2015).
Translation students and trainees also seem to face challenges in performing post-editing tasks. For instance, trainees were careless and did not change the phrasing of texts if they were accepted (Depraetere, 2010). Moreover, translators are used to apply their skills to generate solutions for problems in translation from scratch not to choose between available solutions (Pym, 2013) which represents an obstacle in making decisions regarding the changes to be made or solutions to be selected. This problem raises an important question; how to overcome the challenges faced by translation students in post-editing to improve their skills and be ready for the translation market demands.
The researcher has also observed the confusion and hesitance in performing post-editing regarding the changes to be made and the acceptance of MT output while training senior year translation students on post-editing using SDL Trados. Some students could not make decisions whether to re translate the text again manually and ignore the translation output of the MT or to perform a post-edit process and asked how they can take these decisions and on what biases. Other students depended on their sense to identify the MT output errors and sometimes made unnecessary changes to accepted translations.
2.1 Machine Translation
Machine translation involves the automatic production of a target-language text based on a source-language text (Kenny, 2022). It has existed since the 1950s and started with the rule-based Machine Translation to Statistical Machine Translation and recently the Neural Machine Translation. Although Machine Translation has many social implications, the effect on the translation community is greater. At first, many professional translators considered it as a threat posed to translation industry (Vieiva, 2020). This defensive attitude started to decline after the breakthrough achieved and the progress made in Machine Translation output, and translators started to cope. In addition, Machine Translation is being used broadly and for many purposes such as understanding the content, text publication in other languages and communication as in the translation of emails and chat room discussions. Moreover, the Fully Automatic High Quality Machine Translation FAHQMT emerged to offer translation services of web sites such as Yahoo and Google and in the field of speech recognition as well (Koehn, 2012)
Machine translation involves different approaches according to the way of analyzing source data (input) and then generating the target text (output). One of these approaches is the Rule-Based Machine Translation (RBMT) approach which stratifies combinations of linguistic rules in three stages: analysis, transfer, and generation. It analyzes the morphological features of the ST and adds part of speech tagger followed by selecting words, transferring structure, processing morphological generator, and finally generating the TT. These processes require dictionaries, rules of both SL and TL and rules directing the machine to relate these two structures together. Another approach is Corpus-Based Machine Translation (CBMT) which uses large amounts of raw data in the form of parallel corpora. This approach is divided into Statistical Machine Translation (STMT) and Example-based Machine Translation. The focus of this research would be paid to former. (STMT) depends on statistical models derived from the analysis of bilingual corpora, and the initial model based on Bayes Theorem proposed by Brown et al. (1990). Mainly, the Statistical Translation model guarantees that MT system produces target hypothesis corresponding to the source sentence. However, this STMT has challenges as the output is unexpected and could be deceiving; languages with significant word order have unsatisfactory results, yet the benefits are emphasized for the European languages (Okpor, 2014).
As Machine Translation is an imperfect technology, as it might produce an accurate and contextually acceptable translation, but it might have a serious error in meaning, an omission, an addition, or a stylistic problem. Hence, post-editing is necessary to fix any errors in the text. The identification of such errors and their revision, or correction, is known as post-editing (O’Brien, 2002). However, performing post-editing requires an evaluation of the Machine Translation output first to check its quality.
A question has been raised, ‘How the MT output could be evaluated?’. In fact, both Manual evaluation and automatic evaluation are used to evaluate the MT output; however, the most used two criteria used for evaluating the MT output are fluency and adequacy under manual evaluation (Daems, S; Vandepitte, S; Hartsuiker, R; & Macken, L. 2016).
2.2 Post-Editing
Post-editing (PE) is “the correction of raw machine translated output by a human translator according to specific guidelines and quality criteria” O’Brien (2011: 197–198). While MT has been around for decades, and the use of the technology has grown significantly in the language industry in recent years, PE is still considered a relatively new task. From a scientific point of view, PE is the task that gathers human translators and machines, as well as MT discipline and Translation Studies discipline (Čulo, 2014). Human revision has been known as exploring and discovering errors (Vascon cellos, 1987), PE is an exercise of adjusting predictable and recurring errors and difficulties
Translation Automation User Society (TAUS) developed focused PE guidelines- for ‘good enough’ and ‘publishable’ quality- as the first set of available guidelines, in partnership with the Centre for Global Intelligent Content (CNGL), which can be used as a basis on which post-editors in professional environments. The main idea is that it delivers the same information as the source text (Massey et al., 2017). This, eventually, should be reflected in post-editors’ skills to identify the PE task requirements.
Generally, the purpose of post editing varies according to its type, the effort and skills needed to perform a post-editing process. Post editing is divided into full and light post editing. Full PE is usually required when the final text is intended for publication. The text must be comprehensible and accurate; grammar and syntax must be flawless. The style of the text is acceptable but does not need to be as good as a human translation. Where low quality is good enough for the final product, light PE aims to make MT output understandable. According to the TAUS3 guidelines, a light post-edited text also needs to convey the same meaning as the source text, but style is not important. The main aspect is that it delivers the same information as the source text (Massey et al., 2016).
2.2.1 Post-editing Skills
Identifying skills required for post-editing has been researched extensively, as researchers have investigated skills to perform post editing task including knowledge of MT technology and systems, perfect command of source language and target language, expertise in text types or specialized subject knowledge and practicing reviewing texts translated by humans (Wagner, 1987; Johnson & Whitelock, 1987; O’Brien, 2002; Doherty et al., 2013). The most significant skills that have been investigated and defined in literature are; knowledge of error typology, how to correct or ignore errors and making corrections directly on screen (Guerberof et al., 2020; Rico et al., 2013 a,b)
Ginovart & Oliver (2020) explored the post editing skills considered capital and necessary as per Language Services Companies, individual professionals and trainers using three online questionnaires. The results showed that practice in post-editing (ISO 18587:2017) covers the three core skills for post editing; (1) capacity to decide when to edit or discard (translating from scratch) an MT result, (2) capacity to identify MT output errors, (3) and capacity to post-edit according to post editing guidelines. These three core PE-related skills were chosen based on the perfect correlation between the three surveyed audiences mentioned above.
De Almeida & O’Brien (2010), for example, suggest that a good post-editor has; 1. The ability to identify issues in the raw MT output that need to be addressed and to fix them appropriately; 2. The ability to carry out the post-editing task with reasonable speed, so as to meet the expectations of daily productivity for this type of activity; 3. The ability to adhere to the guidelines, so as to minimize the number of “preferential” changes, or changes that are not necessary, and which are normally outside the scope of PE.
For example, Popović et al (2014) investigated five types of operations: correcting word form, correcting word order, adding omissions, deleting, additions and correcting lexical choices. They also studied the relationship of these operations with cognitive and temporal PE effort. Sanchez-Gijón (2016) compared the tasks in computer-assisted translation and in MTPE. Hence, the PE task requires similar competences to those in the translation task, except for the instrumental sub competence.
2.2.2 Post-editing effort
Post-editing effort gained great attention in research. Post-editing effort was categorized into temporal, technical and cognitive (Krings, 2001). Cognitive effort is investigated during the translation or post editing process which is considered the most difficult effort to measure (O’Brien, 2005). As for temporal effort such as time consumed to post-edit MT output cannot be the same for different tasks as the specifications in tasks should be the same as assessing the translation quality corresponds with the definition of quality used in the task or project (Colina, 2008). Tezan et.al. (2019) investigated the estimation of Post-Editing Time (PET) using a set of MT error features and suggested that when errors in MT output are known the PET can be estimated with high accuracy. In other words, the time consumed in post-editing could differ for different tasks with different purposes and quality assessment applied and the familiarity of errors occurred in MT output.
2.3 MT Error Identification
Many attempts have been made to identify the MT errors and categorize them. Depraetere (2010) analyzed ten post edited texts done by translation students who are not trained on PE and stressed that MT errors should be included in training to avoid full dependence on MT output. In fact, post editing cannot be considered the same as proof reading as the errors in human translation are different from those in MT output. For instance, spelling and typing errors hardly ever occur in MT output; however, syntactic, and lexical ones are frequently observed in MT output not in human translation (Nitzke, 2016a).
Several error taxonomies have been proposed for detailed analysis of translation quality, the most well-known being the one of Vilar et al. (2006) (Figure 1). It implies word or phrase-based analysis. Another taxonomy proposed by Daems, et al. (2016) dividing the MT errors into two types; acceptability and accuracy errors. (Figure 2)
Figure 1. Translation error taxonomy of Vilar et al. (2006); error types, supported by the Prague and Zurich annotations are circled with solid lines and the ones of the Aachen annotation are highlighted with gray.
Figure 2 Daems, S; Vandepitte, S; Hartsuiker, R; & Macken, L. 2016
The aim of this research, as previously mentioned, is to investigate the impact of studying Machine Translation (MT) errors on enhancing post-editing skills among translation students. The main question is: Does studying MT error identification enhance translation students’ post-editing skills?
The following hypotheses have been examined: (1) Translation students identify MT errors during the post-editing process, (2) translation students make knowledge- based decisions concerning the MT output, (3) and translation students perform post-editing tasks in less time and with less technical effort.
To test the hypotheses and to answer the main question, the current research adopted the quasi-experimental method as it is not possible to randomize individuals or groups participating in this research to avoid the extraneous variables. Quasi-experimental methods test casual hypotheses, offer practical options for conducting impact evaluations in real world settings, and avoid the ethical concerns that are associated with random selection of participants.
3.1 Participants
Thirty translation students participated in the research; all students are in senior year at the Faculty of Linguistics and Translation, Badr University in Cairo. The selection process observed; language level, technology familiarity, and experience in translation among them. The participants had nearly the same GPA to guarantee the level of language and they are all fairly used to using technology. As experience of participants in translation represents a significant extraneous factor in this research; as the more experienced or professional the translators are, the more likely the post-editing task done is better; even they do not have previous knowledge of post-editing (Ericsson 2003), all selected participants didn’t have any working experience in translation. The participants were divided into two groups, an experimental group, and a control group with fifteen students in each.
3.2. The data
MT has limited ability to process certain types of text accurately, as various errors occur in the output (Calude, 2003). Texts selected were compared according to readability, potential translation problems and MT quality. The selected texts required light post-editing and full- post editing to include all types of errors specified and could occur in MT output. The English and Arabic MT output were taken from My Memory- a statistical machine translation- obtained in October 2022. The final corpus consisted of eight texts each containing 50 to 60 sentences. The specifications of the post-editing task were settled as well.
The texts were given to a second and a third professional post-editors to verify their suitability and validity for post-editing. The texts were examined, and the decision was that they need full post-editing. The researcher post-edited the texts to be tested and identified the MT errors and numerated them. Again, the professional post-editors were asked to identify the MT errors in these texts to guarantee the reliability of the texts used in the research. The MT error identification was verified through using inter-rater reliability.
3.3. Error identification
Error identification was based on translation error taxonomy of Vilar et al. (2006) and MT output error classification (Daems, S; Vandepitte, S; Hartsuiker, R; & Macken, L. 2016). The errors were divided into two main categories: accuracy and adequacy. The aim of using this taxonomy is to identify the MT errors to be tested in the pre-test and post-test. These errors are defined in literature as the main MT errors occur during post-editing. Adequacy errors are word sense, meaning shifts, agreement, verb form, structure, word order and grammar. As for acceptability, errors include coherence, lexicon, word collocation, spelling and style.
3.4 Data collection
Data collection was conducted using pre-tests, tasks, and post-tests. As for the data analysis in this research, error analysis was the methodology used in analyzing the data. The error analysis was used since it fit to the characteristics of the data and the nature of this research. Corder (1967) explained that error analysis can deal effectively only with learner production, which suits the output of the participants in this research.
3:5 Procedures
First: Ethical approval was considered as all participants approved to participate in this research, as well as the approval of the School of Linguistics and Translation.
Second: The administration started in September 2022 and ended in December 2022 for an entire academic term. The researcher gave both the control group and the experimental group a pretest (post-editing task) and analyzed the samples according to the criteria; Error identification, time consumed, and decision making. Then, both groups received training on post-editing using SDL Trados; however, the experimental group received focused training considering the MT error identification supported with examples and application using Task-Based Approach. As for the control group received different tasks on post-editing with no focused training or feedback.
Third: The training introduced to the experimental group was divided into five steps according to cycle of task-based teaching prposed by Li (2013) to suit translation as shown in figure 3.
Figure 3. Cycle of Task-Based Teaching in Translation (Li 2013)
Fourth: A post-test was given to both control and experimental groups to detect the improvement in post-editing product and the output samples were analyzed to reach conclusions.
3.5. Data analysis
There were some steps in applying the error analysis to analyze the data. The following steps were based on the procedure stated by Corder as quoted by Ellis (1994). The sample tasks assigned to translation students to post-edit were analyzed according to the following:
In the pre-test translation students in both control and experimental groups performed post-editing for a text from English into Arabic. The errors detected and calculated according to the Machine Translation error taxonomy (Figuer.2)
The data analysis of the pretest for both experimental and control groups showed that they have not identified all MT output errors in the text, have taken needless decisions, have accepted MT output that needed change, have not taken correct decisions regarding the change, and have spent from forty to seventy- five minutes in editing the text.
Post-tests were conducted for both groups and the analysis of participants’ products showed the following according to the variables of the research.
First variable: MT error identification
Table 1
MT errors detected by the control group (pre-test and post-test)
Groups |
Grammar |
Lexicon |
word senses |
Collocation |
Idioms |
Coherence |
style |
register |
Deletion |
Addition |
Meaning shift |
Control pretest |
40% |
73% |
33.3% |
53.3% |
0% |
0% |
10% |
0% |
25% |
0% |
0% |
Control posttest |
40% |
80% |
50% |
53.30% |
0% |
25% |
25% |
0% |
30% |
0% |
0% |
Table 1 shows the percentage of MT Error Identification in the control group in the pre-test and post-test. During training participants in the control group did not receive focused training on MT error typology nor training on how to make decisions regarding translation from scratch or post editing.
Table 2
MT errors detected by the experimental group (pretest and posttest)
Groups |
Grammar |
Lexicon |
word sense |
collocation |
Idioms |
Coherence |
Style |
Register |
Deletion |
Addition |
Meaning shift |
Experimental pretest |
40% |
80% |
26.60% |
66.60% |
0% |
0% |
10% |
0% |
25% |
0% |
25% |
Experimental posttest |
75% |
100% |
100% |
80% |
33.3% |
90% |
85% |
40% |
47% |
75% |
50% |
This table shows the percentage of MT Error Identification in the experimental group both in pre-test and post-test. Errors such as: coherence, register and style witnessed great improvement, while improvement in identifying grammar, lexicon and word sense errors was good compared to identifying errors related to idioms, deletions and meaning shifts.
Table 3
Mean of MT Error Identification improvement
Group
|
Mean of MT Error Identification improvement |
Control group
|
27.6 |
Experimental group
|
70.4 |
Table (3) shows the means of the percentages of MT Error Identification improvement among participants in the control group and the experimental group. The progress made by the experimental group was significant and this proves the first hypothesis that Translation students improved their skills in identifying MT errors during the post-editing process.
Second variable: Decision Making
Table 4
Means of Decisions Made in PE in pre-test.
Group |
No decision |
Needless decision |
Correct decision |
Incorrect decision |
Control |
45
|
12.4 |
13.5 |
29 |
Experimental |
45.6 |
15.8 |
12.7 |
25.9 |
Table 4 shows that participants in both control and experimental groups tended to avoid making decisions and the correct decisions made were few compared to the needless decisions and incorrect ones.
Table 5
Means of Decisions Made in PE in posttest.
Group |
No decision |
Needless decision |
Correct decision |
Incorrect decision |
Control |
41 |
12 |
27 |
20 |
Experimental |
14 |
9.6 |
67.5 |
8.9 |
Table 5 shows the number of correct decisions made by participants in both the control and the experimental groups exceeding the number of those in the pre-test. It is noticed that participants in the experimental group made more decisions than they did in the pre-test. Also, the needless and incorrect decisions were fewer than before compared to the control group.
Third variable: Time-consumed Time consumed in the pretest in both groups ranged from 40 to 75 minutes. In the post-test the time consumed had the same range except very few participants who fulfilled their post task in less time than they spent in the pretest. They presented only 20% of the participants in the experimental group.
Coherence, register, and style errors were better identified by participants in the experimental group with a high percentage. It is worth mentioning that these errors are linked to the Target Language, as they paid more attention to the target segments. As SMT does not give accurate results between the languages with different word order (Okpor, 2014), so style and coherence errors occurred in the Arabic target segments. In the pre-test, less attention was paid to these errors as the tendency to accept any translation that delivers the main idea prevailed among the participants, yet in the post-test and after receiving the training this tendency changed to more focus on detecting these errors. As for register, participants considered the register of the target text to be on the same level of the source text.
Also, there was apparent improvement in the identification of grammatical, lexicon and word sense errors among participants in the experimental group rather than those in the control group. Identification of these errors was relatively good compared to other errors in the pretest; however, it became much better in the posttest. It was observed that participants focused first on lexicon and word sense errors in the pre-test and during the training as they think that errors are likely to be related to vocabulary and terms. It might also be the easiest errors to be detected, while after receiving the training the perspective became wider. The least identified errors were idioms, deletions and meaning shifts. These errors are much related to the source text and need source text analysis before moving to the target text. Meaning shifts and idioms were generally very few in the source texts, so the difference between the pre-test and post-test was not very high. As for deletion errors detection was harder because it needed a revision for both source and target segments which was not achieved by all participants who dedicated their efforts to revise the target segments in the first place.
The control group also witnessed improvements in identifying lexicon and word sense errors and in coherence and style. This could be related to the different tasks given to them to post-edit as they were given feedback that helped them to enhance their skills but in a very slow scale.
All in all, the result of the pre-test and source test proved the first hypothesis which is that translation students can identify MT errors during the post-editing process and became familiar with the MT error typology and how to correct them.
Regarding the decision making made by the participants in the experimental group it is logical to have different results due to the progress made in error identification. The decisions made by the experimental group were higher in the post-test than in the pretest. Also, the incorrect decisions became fewer as the awareness of the errors helped in taking correct decisions. Needless decision in the pretest were an indication of unawareness and inability to realize and detect the errors in the target segment. The number of needless decisions was very low after the post-test among the participants of the experimental group. These results proved the second hypothesis that translation students acquire knowledge- based decisions concerning the MT output
By comparing the time consumed in the pre-test and post-test by the participants in the experimental group, it was observed that the time range was nearly the same for the same number of segments. Only twenty percent of them achieved their tasks in less time in post-test; however, the quality of post-editing process is much better. It should be noted that related research in literature had similar results as boosted by Colina (2008) that PE time for MT output cannot be the same for different tasks. Also, Tezan et al. (2019) who tested PE time depends on the quality assessment applied and the familiarity of errors occurred in MT output. The researcher believes that although time consumed in post-test did not decrease with all the extra efforts exerted in MT error identification and the decisions made, it is considered an improvement in performance. Normally, a task achieved with extra effort needs more time than a task achieved with less effort, so if this hard task took same time as the easy one that means the performer saved time. This research is significant as it contributes to the awareness of translators’ changing role considering the emerging developments in the MTPE sector. The findings contribute to translation pedagogy literature by adding new ideas to train translation students to cope with the latest development in translation. Future research could be conducted on large number of students and using different text types.