Bridging Language Barriers: Addressing Complexity, Accuracy, and Fluency Challenges of Egyptian Researchers in Writing Research for International Publication

Document Type : Original Article

Author

School of Linguistics and Translation, Badr University in Cairo

Abstract

Researchers are encouraged to publish research internationally due to the quality standards international journals strictly follow as well as the weight added to the research published in them when compared to publishing in a local journal. There is a need, however, to provide guidelines to non-natives of English to help them overcome any linguistic challenges they may face when writing their research. The original manuscripts of 20 Scientific, Technical, and Medical (STM) research articles (RAs), written by Egyptian researchers, were thus analyzed according to the complexity, accuracy, and fluency (CAF) framework. The analysis of the researchers' writing proficiency revealed that the language used was simple both grammatically (1.2 clauses/T-unit) and lexically (7 lexical word types/T-unit). The accuracy score (75% error-free T-units) showed a number of errors to consider, that is, punctuation errors, spelling errors and typos, confusing the noun and verb forms of a word, run-on sentences, subject-verb agreement, and sentence fragments. The fluency score showed within average T-units (19 words/error-free T-unit). These findings highlight some linguistic areas of difficulty and hence set the foundation for tailored programs to avoid them, a step toward improving the research status of Egyptian researchers as well as those whose native language is not English.

Keywords

Main Subjects


Introduction

Background and Problem Statement

Scientific writing is the cornerstone of academic communication, enabling researchers to share discoveries and contribute to global knowledge. Among the most widely recognized forms of academic writing is the research article (RA), which plays a vital role in the dissemination of scientific findings. Swales (1990) defines the RA as a structured text reporting on an investigation while contextualizing its findings within existing research. Hyland (2011) further emphasizes its function in constructing scientific discourse, where language becomes a tool for knowledge creation and validation.

RAs are dynamic, evolving texts that engage in scholarly dialogue, building upon prior research to advance scientific understanding. Over time, they have undergone linguistic and structural changes, shifting toward greater complexity (Swales, 1990). Key developments include an increase in noun and subordinate clauses, a decline in relative clauses, and a preference for abstract subjects over concrete entities. Additionally, the rise of action verbs and a reduced reliance on reporting verbs reflect a shift in focus from individual researchers to their findings. Mastering these conventions is crucial for scholars aiming to publish in reputable journals.

Publishing research is essential for academic advancement, and international journals are particularly valued due to their stringent quality standards. Despite this, Egypt ranks 36th among 231 countries in research output, contributing only 0.6% of global publications and 11% regionally (Scimago, 2016). In 2016, Egyptian researchers published 18,876 articles in international peer-reviewed journals, primarily in the Scientific, Technical, and Medical (STM) fields (Ministry of Higher Education and Scientific Research, 2016). However, the citation impact of Egyptian research was 0.9 in 2015, falling below the global average (Academy of Scientific Research and Technology, 2016).

One of the primary barriers to international publication for Egyptian scholars is proficiency in English, the dominant language of academic publishing. Shehata and Eldakar (2018) identify language difficulties as the second most significant challenge, following a lack of familiarity with international publishing standards. Major publishers frequently cite poor English proficiency as a reason for manuscript rejection (Springer, n.d.; Thrower, 2012; Jalongo & Saracho, 2016; Eassom, 2018). Addressing these linguistic challenges is critical for increasing the global visibility and impact of Egyptian research.

Study Rationale and Significance

While many non-native English-speaking scholars contribute valuable research, language barriers often hinder publication. Despite the importance of this issue, few studies (i.e., El-Seidi, 2006; Awwaad, 2012; Hosni, 2015; Mohamed, 2020; Alaa Eldin, 2021) have examined the linguistic challenges faced by Egyptian researchers in academic writing. Understanding these difficulties is essential for developing targeted strategies to improve research writing and increase the acceptance of Egyptian-authored RAs in international journals.

This study targets analyzing the English language challenges encountered by Egyptian researchers when writing STM RAs for international publication. Unlike previous research, which primarily examines published articles, this study focuses on original manuscripts before publication. This approach provides a more accurate assessment of linguistic difficulties, offering insights into areas that require improvement. By highlighting common writing challenges, this research seeks to support Egyptian scholars in producing clearer, more effective academic texts, ultimately enhancing their success in international publishing.

Aim of the study

This study aims to identify the linguistic obstacles faced by Egyptian researchers in writing STM RAs in English. By analyzing their writing proficiency, particularly the writing complexity, accuracy, and fluency (CAF), the research seeks to raise awareness of common difficulties and propose strategies for improvement. The ultimate goal is to enhance the clarity, coherence, and overall quality of Egyptian-authored RAs, increasing their acceptance in high-impact international journals.

Research questions

The study seeks to answer the following questions:

  1. What is the Egyptian STM researchers' writing proficiency in their original manuscripts in terms of the complexity, accuracy, and fluency framework?
    1. What is the level of grammatical and lexical complexity of the Egyptian researchers in the selected sample?
    2. What is the level of accuracy of the Egyptian researchers in the selected sample?
    3. What is the level of fluency of the Egyptian researchers in the selected sample?

Literature review

Previous studies

Existing studies have examined the challenges researchers face in academic writing, often focusing on specific linguistic aspects rather than overall writing proficiency. For example, Onwuegbuzie (2017) analyzes formal grammatical errors and APA style violations in 117 research articles published in Research in the Schools by authors from diverse national backgrounds. The study identifies 35 recurring errors and provides guidelines to help researchers avoid manuscript rejection due to APA noncompliance. However, it primarily addresses formatting issues rather than broader linguistic challenges.

Similarly, McDowell (2016) explores nominal group structures in research articles written by 13 Japanese scholars using systemic functional grammar. The study finds frequent errors, particularly the omission of the indefinite article a and incorrect -of prepositional phrases. While insightful, its limited sample size and narrow focus on specific grammatical features restrict its generalizability to broader writing challenges.

Other studies have assessed writing complexity, accuracy, and fluency but differ in scope from the present research. Ahmadi and Meihami (2017) analyze these factors in essay writing by Iranian university students, comparing topical and general essays. Findings indicate higher complexity, accuracy, and fluency scores in topical essays, though the study’s small sample limited its representativeness. Similarly, Lan (2015) examines the grammatical and lexical complexity of writing by 30 Vietnamese EFL learners, revealing a balanced use of dependent and independent clauses and diverse lexical choices. However, the study does not address writing fluency. Cho (2015) investigates the complexity and fluency of argumentative essays written by 110 Korean high school students, reporting an average of 1.3 clauses per T-unit and 8 words per T-unit but omitting accuracy as a variable.

Notably, most of these studies analyze student essays rather than research articles, making their findings less applicable to advanced academic writing. Additionally, they focus on isolated linguistic features rather than the comprehensive writing challenges faced by established researchers. This study aims to address this gap by examining the linguistic difficulties encountered by Egyptian researchers in writing research articles for international publication. 

Benzehaf (2016) conducts a longitudinal study tracing the development of CAF in the writing of 45 Moroccan high school students learning French as an L2 and English as an L3. The study finds consistent improvement in English writing but inconsistent development in French, highlighting the need for more longitudinal research on language learning. Expanding on this work, Benzehaf (2017) examines the correlation between writing proficiency and CAF in 88 high school students, confirming CAF as a reliable measure of proficiency. Similarly, Jebbour (2021) explores the impact of self-disclosure tasks on writing development in 19 Moroccan university students but finds no significant improvement. While these studies validate the CAF framework, they focus on student writing rather than the high-stakes academic writing of established researchers, which is the concern of the present study.

Research on the English proficiency of Egyptian researchers' RAs remains scarce. El-Seidi (2006) examines syntactic challenges in 100 English-language RAs in medical and hard sciences published nationally. The study identifies five major difficulties: subordination, coordination, juxtaposition, sentence fragments, and erroneous syntactic structures, all of which hinder clear academic writing. While this research provides valuable insights, it focused solely on RAs published in national journals, whereas the present study examines manuscripts submitted for international publication.

Hosni (2015) analyzes metadiscourse markers in the Discussion sections of 90 linguistics RAs, comparing those published in English and Arabic. Findings indicate that internationally published RAs contain more metadiscourse markers than nationally published ones, with greater use in English-language articles. However, this study focuses on a single RA section and a single discipline. Awwaad (2012) compares and contrasts the rhetorical structure of 20 abstracts written in international RAs and 20 abstracts written in Egyptian MA theses. Based on that analysis, he highlights some areas that Egyptian researchers should consider when writing an abstract. Despite the value Awwad’s study adds, it is limited to the abstract without tackling the Egyptian researchers’ performance in writing the other sections of an RA.

Mohamed (2020) extends previous research by analyzing metadiscourse markers across all sections of 20 RAs, revealing that Egyptian researchers prioritize interactive markers over interactional ones, aligning with academic writing conventions. Alaa Eldin (2021) investigates rhetorical moves in 20 RAs, finding that 90% exhibit structural issues, underscoring a key weakness in Egyptian researchers’ writing. However, none of these studies assess CAF, which provides a more comprehensive measure of writing proficiency.

Overall, this review highlights a gap in applying the CAF framework to assess the writing proficiency of Egyptian researchers. Additionally, the present study is distinct in analyzing original manuscripts before publication, an approach not previously adopted in research on Egyptian-authored RAs.

Data and methodology

Data

A total of 20 RAs were selected from the database of a major international scientific publisher (which the author had access to until 2019) based on the following criteria:

  1. The RA must be recent (published no earlier than 2016) in the STM disciplines.
  2. It must be authored by Egyptian researchers from Egyptian institutions.
  3. No researcher appears more than once in the sample to have as many authors as possible.
  4. The RA must have been submitted for publication in an international journal.

Any RA failing to meet these criteria was excluded to ensure the study remains focused on the current challenges Egyptian STM researchers face in publishing internationally in English. These selection criteria also enhance the transferability and representativeness of the findings.

The 20 RAs were drawn from a pool of submissions, with prior notification to their authors. This approach ensured access to a sufficient number of RAs to account for any potential sample attrition.    

Research design

This study employs a mixed-methods exploratory design. Primarily qualitative, it analyzes the original manuscripts of STM RAs, which are inherently qualitative data. However, the study incorporates descriptive statistical measures to quantify the data, enhancing the validity and accuracy of the findings. The exploratory approach is adopted due to the limited prior research on the writing proficiency of Egyptian STM researchers, laying the groundwork for future studies. 

Framework of analysis

Skehan’s CAF framework serves as a modern tool for assessing writing proficiency. It has been widely adopted in linguistic research (El-Seidi, 2006; Kuhi et al., 2014; Cho, 2015; Lan, 2015; Ahmadi & Meihami, 2017), though none of these studies specifically examine the writing proficiency of scientific RA authors. The framework provides a comprehensive measure of writing ability, where high proficiency is characterized by sophisticated language use (complexity), minimal errors (accuracy), and smooth, effortless production (fluency) (Skehan, 1998, 2009; Wolfe-Quintero et al., 1998).

The CAF framework emerged in the 1990s as a unified model of language proficiency (Housen & Kuiken, 2009; Housen et al., 2012). Its three components correspond to distinct stages of language development: acquiring new structures (complexity), refining linguistic knowledge by reducing errors (accuracy), and achieving automaticity in language use (fluency) (Kuhi et al., 2014; Housen et al., 2012). Despite their interrelation, these components can sometimes be in tension—greater complexity may come at the expense of accuracy, and fluency may compete with both complexity and accuracy (Skehan, 2009).

Complexity refers to the sophistication of language use. Skehan and Foster (2008) describe it as “new cutting-edge and possibly risky language [that] foreshadows growth in the interlanguage system.” Housen et al. (2012) define it as “the ability to use a wide and varied range of sophisticated structures and vocabulary in the L2.” Complexity can be grammatical—reflected in sentence embedding and subordination (Wolfe-Quintero et al., 1998)—or lexical, measured by vocabulary diversity and word choice variation.

Accuracy is the ability to produce error-free language. It is defined as “the ability to be free from errors while using language to communicate” (Wolfe-Quintero et al., 1998) and as “the ability to produce target-like and error-free language” (Housen et al., 2012). Higher accuracy is associated with fewer grammatical and lexical errors.

Fluency represents the ease and speed of language production. It is “the ease with which the language user can retrieve the language items that he or she needs” (Wolfe-Quintero et al., 1998) and reflects “a person’s general language proficiency” (Craven, 2017). Fluency is often measured by the number of words produced within a given time.

The T-unit, rather than the sentence, is used as the primary unit of analysis due to its consistency in length and structure (Raish, 2017). First introduced by Hunt (1965), the T-unit is “the minimal terminal unit,” comprising an independent clause and any dependent clauses attached to it. For example, in the sentence, “While I was studying, the bell rang, and therefore I got up to see who it was,” there are two T-units: (1) “While I was studying, the bell rang” and (2) “I got up to see who it was.” The T-unit provides a more stable measure of linguistic complexity and proficiency.

Data collection

The 20 RAs were selected based on the predefined criteria outlined in the Sample section. Research outside the STM disciplines, published in local journals, authored by non-Egyptian researchers, or already published was excluded from the study.

Corresponding authors were contacted via email to obtain informed consent. If an author declined participation or did not respond, the RA was replaced with another from the pool of eligible submissions.

For analysis, each selected RA was prepared by removing non-essential sections, including front matter (journal and author information, abstract), back matter (e.g., Acknowledgments, Appendix, Disclosure, Conflict of Interest), tables, figures, and the References list.

 

 

Analysis procedure

The analysis of writing proficiency in this study is conducted using the CAF framework. Each RA is evaluated based on these three dimensions, with scores calculated accordingly. According to Wolfe-Quintero et al. (1998), CAF can be measured through three different approaches: frequency count, ratio, and index measures. Frequency count is the most straightforward method, as it involves counting the occurrence of specific linguistic features. However, it is considered the least reliable because of its variability and limited generalizability across different populations. Index measures, on the other hand, involve complex formulas that generate numerical scores. While they can produce valid insights, their application can be challenging and less adaptable to broader contexts. Ratio measures, which calculate the proportion of one linguistic unit in relation to another, offer a more reliable and representative approach. Due to their balance between validity and applicability, ratio measures were chosen as the primary method of analysis in this study.

Complexity is assessed in two ways: grammatical complexity and lexical complexity. Grammatical complexity is measured using the T-unit complexity ratio, calculated by dividing the number of clauses by the number of T-units (C/T). A clause, according to Cambridge Dictionary, is defined as a structure containing both a subject (or a noun phrase) and a verb phrase, which may be independent (able to stand alone) or dependent (relying on an independent clause). The greater the number of embedded dependent clauses within a T-unit, the higher the grammatical complexity of the text. Lexical complexity, in turn, is evaluated by calculating the ratio of lexical word types to T-units (LWT/T). Lexical word types include nouns, adjectives, lexical verbs, and adverbs. To ensure an accurate measure of lexical diversity, repeated occurrences of the same word type are counted only once. A higher ratio of lexical word types per T-unit indicates a greater degree of lexical complexity.

Accuracy is measured based on the proportion of error-free T-units in each text (EFT/T). The types of errors considered in this analysis include subject-verb agreement mistakes, sentence fragments, run-on sentences, incorrect noun and verb forms, missing or incorrect prepositions and subordinating conjunctions, errors in parallel structure, punctuation mistakes, and typographical errors. These errors are selected for analysis because they are common in academic writing and are not specific to any particular scientific discipline. By focusing on these universal aspects of language proficiency, the study findings will be broadly applicable to different STM fields.

Fluency is determined by calculating the average length of error-free T-units. This is done by dividing the total number of words in error-free T-units by the number of such units ((W in EFT)/EFT). A higher word count per error-free T-unit indicates greater fluency, as it reflects a more effortless and confident production of language. To maintain consistency in word count, in-text parenthetical citations are excluded from the analysis. Different citation styles (e.g., author-date versus numerical citations) can otherwise introduce variability in the results, making comparisons less reliable.

To ensure consistency in T-unit segmentation, the study follows Polio’s (1997) guidelines for handling ambiguous cases. For instance, run-on sentences and comma splices are counted as two separate T-units, with the first marked as erroneous. Sentence fragments are treated differently depending on their structure; if a fragment lacks a verb, it is considered an independent T-unit with an error, whereas phrase or clause fragments are merged with adjacent T-units. Parenthetical elements are counted as separate T-units. These guidelines ensure uniformity in the analysis and minimize inconsistencies in data interpretation.

Although most CAF calculations are conducted manually, word count and lexical diversity are measured using digital tools to enhance accuracy. Microsoft Word’s word counter is used for overall word count, while lexical word types are calculated using an online lexical diversity tool (wordcounter.net). In addition, a peer-checking process is implemented to verify the accuracy of the data analysis. A sample of the data is independently reviewed to ensure consistency in score calculations and prevent errors in interpretation. Finally, the results of the analysis are presented both qualitatively and quantitatively, using descriptive statistics such as means, percentages, and frequency counts.

Results and discussion

The writing proficiency is investigated using the CAF framework in the writing of the authors of the 20 RAs under investigation. Notable results are shown on the level of CAF as a whole as in Table 1.

Table 1: The overall results of the CAF analysis.

 

T

EFT

W

W in EFT

C

LWT

 

Grammatical Complexity

C/T

Lexical Complexity

LWT/T

Accuracy

EFT/T

Fluency

(W in EFT) /EFT

Total

2618

1972

53625

37101

3163

18646

 

1.2

7

0.75

19

In Table 1, T stands for the number of T-units, EFT is the number of error-free T-units, W is the total number of words, W in EFT represents the total number of words in error-free T-units, C denotes the number of clauses, and LWT is the number of lexical word types. As shown in the table, the number of T-units in the analyzed sample is 2618 T-units, from which 1972 are error-free. The number of clauses is 3163, the number of lexical word types is 18646, the total number of words is 53625, and the number of words in error-free T-units is 37101. Therefore, it is found that, in total, grammatical complexity (C/T) gives a score of 1.2, meaning that each T-unit consists of almost 1 clause only, which denotes low grammatical complexity. Concerning lexical complexity (LWT/T), the results show that each T-unit has an average of 7 lexical word types, which is not a high degree of complexity as well. Moving to another important constituent of the CAF framework, the RAs analyzed show an accuracy (EFT/T) of about 75%, meaning that in every 100 T-units, 75 are error-free. This is an acceptable overall accuracy score, yet it needs to be improved in order to acquire a better position in international publication. Finally, regarding fluency ((W in EFT)/EFT), it is found that there are 19 words per an error-free T-unit on average. This score shows that the authors of the analyzed data have been somehow fluent on average. Each of these measures is tackled in more detail next.  

Complexity

Complexity is divided into grammatical complexity and lexical complexity. Fig 1 shows the results of the grammatical complexity analysis in the 20 RAs.

 

Fig 1: The results of the grammatical complexity analysis.

As shown in Fig 1, the RA with the least degree of grammatical complexity in the analyzed sample is RA 2, with a score of 1.09. RA 2 has 171 T-units and 187 clauses in total, showing that most of the T-units consist of only one clause, with minor occurrences of clausal embedding (only 16 clauses embedded, i.e., less than 1% of the total number of clauses in the RA). In this RA, for example, all T-units have no more than two clauses, such as the following example: "In this study, gene expression of the undifferentiated MSCs was relative to GAPDH while that of the differentiated cells was relative to that of the undifferentiated ones" (RA 2, p. 10). Therefore, the authors of the analyzed RAs prefer not to use syntactic embedding and subordination to a great extent.

The RA with the highest degree of grammatical complexity, on the contrary, is RA 10. It has a score of 1.42, with 121 T-units and 173 clauses. This is still not a high degree of grammatical complexity, where more than half of the T-units consist of a single clause. However, this RA has many instances of clausal embedding. For instance, RA 10 is one of the few RAs in the data where one can find instances of more than 3 clauses in one T-unit, such as the following example: "After revising the previous CS operative details of the women recruited (n=250), categorization had been performed into: Group I (n=89), where both the visceral and parietal peritoneum had been left opened non closed, Group II (n=75) where the only parietal peritoneum had been closed, leaving the visceral peritoneum opened, and Group III (n=86), where both the visceral and parietal peritoneum had been closed" (RA 10, p. 4). This T-unit consists of one independent clause and four dependent clauses, giving a total of five clauses, which marks it as a highly syntactically complex T-unit.    

Overall, the level of grammatical complexity in the analyzed data is low, with a score of 1.2 (C/T = 3163/2618 = 1.2). This means that most of the T-units (83%) consist of a single clause, which reflects the simplicity of grammatical construction of the majority of the T-units in the data. This may also reflect that the RA authors find it challenging to write more complex syntactic structures to convey their intended meaning and ideas. This structural simplicity reflects low writing proficiency. The inability to produce syntactically sophisticated structures is an issue that is worth mentioning.

The grammatical complexity score of the current study is different from the scores calculated in the studies of Lan (2015), Cho (2015), Benzehaf (2016),  Benzehaf (2017), and Jebbour (2021). The grammatical complexity score of Lan (2015) indicates that, on average, 48% of the clauses used are dependent and 52% are independent, showing more subordination than in the current study. Also, Cho (2015) shows an average grammatical complexity score of 1.3 clauses per T-unit, a score close to but still higher than that of the current study. Benzehaf (2016) and Benzehaf (2017), on the contrary, show lower grammatical complexity score, with the highest in Benzehaf (2016) being 1 and that in Benzehaf (2017) being 0.53. However, a much higher score than this study or the previous is reported in Jebbour (2021), with a post-test mean of 4.06. It is worth noting though that these studies analyzed written essays which are much smaller texts than RAs, and therefore it would be unfair to put both scores on the same scale. More studies are needed to use the CAF framework to analyze RAs to compare the results of the current study to theirs. Despite this, researchers are seen to be in need of guidance on how to write complex structures to convey their ideas.     

            The other part of the complexity duo is the lexical complexity. Fig 2 presents the lexical complexity results.

 

Fig 2: The results of the lexical complexity analysis.

As can be seen, the RA with the lowest degree of lexical complexity is RA 18, with a score of 4.8. RA 18 has 212 T-units and 1020 LWTs. This means that each T-unit has no more than 5 lexical word types on average, which denotes that the authors of this RA do not have a rich lexicon from which they can formulate their ideas. Although RA 18 is the RA with the highest number of T-units in the data (212 T-units), these T-units are mainly short in length and with low lexical diversity. For instance, a T-unit like "In the operating room under topical anesthesia using methyl cellulose injection in the anterior chamber through a side port, a spatula was inserted either temporally or nasally until the tip of the spatula is seen under the conjunctiva doing synechiolysis with a sweeping to and fro movement attacking the contralateral side of the scleral flap" (RA 18, p. 5) is one of the longest in the RA and the most lexically complex, consisting of 55 words in total. However, only 42 words (76%) are lexical word types, while 13 words are repeated.

In contrast, RA 7 has the highest lexical complexity score. With 158 T-units and 1390 LWTs (which is also the highest number of LWTs in an RA in the data), RA 7 shows a lexical complexity score of 8.7, denoting an average of about 8 to 9 lexical word types in a T-unit, which is also a low score. To illustrate, a relatively lexically complex T-unit from RA 7 is as follows: "The supernatants were used for malondialdehyde (MDA) determination which has been identified as the product of lipid peroxidation that reacts with thiobarbituric acid (TBA) in acidic medium at 95 °C to give pink product measured by a spectrophotometer at wave length equals 534 nm using tetramethoxypropane as a standard" (RA 7, p. 6). When analyzing the lexical complexity of this T-unit, it consists of 49 words, 44 of which are lexical word types (89%), and the five words that are repeated are mainly articles and prepositions. It is worth noticing that the longer the T-unit, the higher the chance for lexical complexity, which means that there can be a direct relation between fluency and lexical complexity.

With 18646 LWTs and 2618 T-units in the data, the overall lexical complexity score is 7 (LWT/T = 18646/2618 = 7). This means that, on average, each T-unit has 7 lexical word types among its constituents. This reflects that the vocabulary of the RA authors may not be strong enough to show lexical richness. Although the analyzed RAs show more lexical complexity than grammatical complexity, the overall complexity scores emphasize the difficulty faced by the authors of the analyzed data in producing sophisticated language, whether in structure or in the vocabulary diversity used. Compared to previous research, Lan (2015) has almost half of the words as different lexical word types. Although that seems to be a much higher score than that of the current study, the lexical complexity of essays, however, may not be comparable to mega writing projects such as RAs.

Accuracy

Accuracy refers to the soundness of the text grammatically and orthographically. Fig 3 demonstrates the accuracy of each RA in the analysis.

 

Fig 3: The results of the accuracy analysis.

The RA which shows the lowest score of accuracy is RA 14. This RA has the fewest number of EFTs, that is, 43 EFTs, and when compared to the total number of T-units in it (i.e., 89 T-units), an accuracy score of 48.3% is the outcome. It is also the only RA in the analyzed data with an accuracy score below 50%. This score denotes that more than half of the T-units in the RA have errors, which puts the RA at risk of being rejected for poor language. Conversely, RA 9 shows the highest accuracy score in the data. Having only 6 T-units with errors, RA 9 has 81 EFTs out of 87 T-units leading to a score of 93%. It is worth noting that this is the only RA with an accuracy score above 90%. All the other RAs lie in the range of ~60%–85%, which is a wide range of scores. These scores reflect a notable difference in the level of writing proficiency of the authors of the analyzed data in terms of the accuracy of the produced language. 

From a total of 2618 T-units in the 20 analyzed RAs, 1972 T-units are found to be error-free. This denotes an accuracy of 75% (EFT/T = 1972/2618 = 0.75), which may be seen as an acceptable score to non-native speakers of English. However, the huge gap between the RA with the lowest accuracy score and the RA with the highest may denote otherwise. This overall score means that 646 T-units have major grammatical and/or orthographical issues. The main types of errors in the data are highlighted in Table 2.

Table 2: The main accuracy errors in the analyzed data.

Type of Error

Examples

Spelling/orthographical

·         pigmentarychanges

·         in ech visit

Punctuation

·         Chen et al., reported

·         For evaluation of liver function; activities were determined

Confusing noun and verb forms

·         a follow up period of 36 months

·         at first check up

Run-on sentences

·         Both groups were assigned to LCs according to their duty schedule, thereby no surgeon selection was attempted.

·         4.7% of them gave a history of diabetes, 3.2% gave a history of hypertension.

Subject-verb agreement

·         the tongue were dissected

·         There were no remarkable difference

Sentence fragments

·         The lens power gradually changing to approach the final lens power.

·         Bimanual irrigation aspiration of any residual cortical material.

Incorrect/missing prepositions

·         their age ranged from to 23 to 38 years

·         and agreed our results

Incorrect subordinating conjunction sentence structure

·         While, as regard to the changes in Oxygen saturation (Spo2), heart rate (HR), mean arterial pressure (MAP) in both studied groups.

·         Although high response in our study.

Incorrect/missing articles

·         away from the both tubes

·         The results of his study coincided partially with a results of this study

As the table shows, the data has errors in using punctuation marks, spelling errors and typos, confusing the noun and verb forms of a word, run-on sentences, subject-verb agreement, and sentence fragments, among other errors. These results are in line with the results of El-Seidi (2006), which also investigates the syntactic errors in 100 Egyptian-authored scientific RAs, in that incorrect subordination and sentence fragments are among the top grammatical errors in the data. Besides, Onwuegbuzie (2017) has subject-verb agreement as the most occurring grammatical error in his study as well which analyzes 117 RAs from one journal. Furthermore, it is worth noting that, in the current study, punctuation errors alone stand for about 23% of the total accuracy errors in the data, which is a clear indicator of a major grammatical issue that needs to be addressed.

Other errors appear while analyzing the RAs as well, such as the following: incorrect parallelism (e.g., 'eating foods from animal source not only reduced stunting but also improving other anthropometric indices'), noun-number agreement (e.g., '12 patient, 4 year'), the incorrect use of adjectives instead of adverbs and vice versa (e.g., 'postoperative' instead of 'postoperatively'), the incorrect use of nouns instead of adjectives and vice versa (e.g., 'bacterial' instead of 'bacteria'), using a singular noun in a structure that requires a plural noun and vice versa (e.g., 'one of the most common complication'), inaccurate collocations (e.g., 'between 35 to 45 years'), incorrect coordination (e.g., 'the duration of post-operative analgesic effect, the first time to analgesic requirement, improved the quality of Bier's block'),  incorrect pronouns (e.g., 'Schmidt el al. in his single dose model'), incorrect possessive forms (e.g., 'parents's'), redundancy (e.g., 'we used the paired t-test used'), and double negative (e.g., 'no any'). All these types of errors need careful consideration while writing an RA, where they can weaken the overall accuracy level, hence leading to a possible rejection from publication due to poor English.

Other previous research report inconsistent accuracy scores. For instance, Benzehaf (2017) and Jebbour (2021) show relatively low scores, that is, 48% and 37% as the best score achieved in each, respectively. This falls in contrast to Benzehaf (2016) that shows an accuracy score range of 72% to 100%, relatively high scores. Comparing the current study to the previously mentioned ones, the performance of the study participants lies in a favorable position. Despite that, high school students (Benzehaf, 2016; Benzehaf, 2017), university students (Jebbour, 2021), and established researchers (the current study) are different participants with different background, experience, potential, and context.

Fluency

The final constituent of the CAF triad is the fluency level. The results of the fluency analysis are shown in Fig 4.

 

Fig 4: The results of the fluency analysis.

According to the fluency analysis results, the lowest average number of words per EFT is shown in RA 14, with about 14 words. Note that RA 14 is the one with the lowest accuracy score as well. Therefore, with more than half of the T-units in the RA having errors, the remaining EFTs are found to be of average to short length since the longer the T-unit, the more the chance to have an accuracy error, which shows that there could be a direct relation between fluency and accuracy. This can be also explained by the fact that both accuracy and fluency depend in their measurement on the same key constituent which is the EFT. For instance, the very first EFT in RA 14, which is the second T-unit in the Introduction section, measures 15 words: "As most fibroids are asymptomatic, the true prevalence of fibroids may be greatly higher" (RA 14, p. 2). Even shorter EFTs can be detected in the body of the RA such as "one had previous myomectomy" (RA 14, p. 4), consisting of only 4 words. 

The highest fluency score, however, is observed in RA 8 which has about 23 words per EFT. This score could mean that the authors of the analyzed RAs do not find it difficult to produce sound-written units. An example of a long T-unit in RA 8 is as follows: "The source of radiation, the dose and the scarification dates were determined based on a pilot study to generate mucositis model in rats in response to regional radiation using the same method of irradiation described before in the present experiment with different doses" (RA 8, p. 4). This T-unit is one of the longest in the data, with 43 words in total, reflecting high fluency.

Overall, the average fluency score in the analyzed data is 19 ((W in EFT)/EFT = 37101/1972 = 19). This indicates that the error-free T-unit in the data consists of 19 words, which is within average according to the literature. Fluency is one of the writing proficiency aspects investigated in Cho (2015). When measuring the number of words per T-unit, Cho (2015) finds that each T-unit consists of 8 words on average. Another study that analyzes the fluency of English language learners is Ahmadi and Meihami (2017). The highest fluency score reached in their study is almost 13 words per T-unit on average. Despite the difference in settings, the way of measurement, and the nature of the written text between these studies and the current study, the score reached in the current study shows higher fluency compared to these two studies, that is, lengthier T-units. That is not the case in Jebbour (2021), however, which finds almost similar fluency figures to the current study ranging from 19 (pre-test) to 23 (post-test). Despite the simplicity in the structure and the limited vocabulary used as proved from the complexity scores, the authors of the analyzed RAs are still able to construct meaningful language that served their arguments.  

Conclusion

This study aims to provide a comprehensive understanding of the linguistic challenges faced by Egyptian researchers in preparing RAs for publication in international STM journals. By analyzing their writing proficiency through the lens of CAF, the study sheds light on key difficulties encountered in English academic writing. The primary objective is to raise awareness among Egyptian researchers, as well as non-native English-speaking scholars more broadly, regarding these challenges. By identifying areas requiring improvement, the study can contribute to developing strategies that enhance the accessibility of Egyptian STM research to international academic platforms.

The findings reveal that the language used in the analyzed RAs is relatively simple, both grammatically and lexically, as indicated by a mean of 1.2 clauses per T-unit and 7 lexical word types per T-unit. The overall accuracy score of 75% suggests that a substantial portion of the text is error-free, yet there is significant variation in accuracy across different RAs. Fluency levels fall within an average range, with an error-free T-unit containing approximately 19 words. These results highlight specific areas where Egyptian researchers may need additional support. To improve the quality of their writing, researchers should consider increasing the structural complexity of their texts by incorporating more subordinate and embedded clauses where appropriate. Additionally, a richer and more varied vocabulary could enhance the clarity and effectiveness of their academic arguments. However, while striving for more sophisticated language use, researchers must remain mindful of common linguistic errors, particularly those related to punctuation, spelling, noun-verb confusion, run-on sentences, subject-verb agreement, and sentence fragments.

In conclusion, the discussion of the study findings has brought to light several challenges that the Egyptian researchers of the analyzed data encountered in attempting to write complex, accurate, and fluent language. The comparison of the current study's results with the results of previous research has also added clarity and context to the findings reached. All these insights can be of benefit to help researchers improve their research writing skills and competency.

The insights gained from this study have direct implications for academic training and professional development. The findings can inform the creation of specialized training programs and instructional materials designed to address the linguistic difficulties faced by Egyptian researchers as well as other non-native English-speaking academics. Research writing courses tailored to the needs of scholars across scientific, humanities, and social science disciplines could help equip them with the necessary skills to produce well-structured and linguistically accurate RAs. Beyond standard academic writing curricula offered by universities and publishers, emphasis should be placed on the specific error patterns and challenges identified in this study, aligning with existing literature on non-native English academic writing. Practical interventions such as structured writing exercises, targeted grammar instruction, and assessments focusing on complexity, accuracy, and fluency could further enhance researchers’ proficiency. Strengthening academic writing skills not only improves the clarity and coherence of research output but also reduces the time, effort, and financial resources spent on language revision, allowing researchers to focus on their core scientific contributions. Given that a significant proportion of the global research community consists of non-native English speakers, addressing these linguistic barriers is essential for fostering inclusivity and ensuring wider international recognition of research from diverse linguistic backgrounds.

Despite its contributions, this study has certain limitations. A significant portion of the analysis relies on qualitative interpretation, which inherently introduces a degree of subjectivity. While measures were taken to ensure consistency, minor variations in interpretation may still arise among different researchers. Additionally, the study employs a discourse analysis approach with a relatively small sample size, which, while insightful, may limit the generalizability of the findings. However, the current exploratory study offers sound basis for further investigation of linguistic challenges Egyptian and non-English-native researchers face when writing research. Future research could also conduct a more in-depth investigation into a specific dimension of CAF across a larger dataset to yield more precise and statistically robust results. Another promising avenue for further study would be an exploration of Egyptian researchers' adherence to international style guidelines (e.g., APA, MLA, Chicago, Vancouver) and the extent to which compliance with these standards impacts publication success. Moreover, while this study focuses on experienced Egyptian researchers, future research could expand to examine variations across different academic ranks, disciplines, and linguistic backgrounds. Given that the challenges associated with academic writing in English are shared by scholars worldwide, further investigations into this issue could offer valuable insights for enhancing global research communication. Since the present study serves as an initial exploration of a relatively under-researched topic, continued inquiry is essential to further refine our understanding of the linguistic factors influencing the international publication of STM research.

Acknowledgements

This paper is part of the author’s dissertation titled: “Linguistic Challenges Facing Egyptian Researchers in Writing English Research Articles for International Publication” (2020). Two other papers were taken from the same dissertation, that is, Mohamed (2020) and Alaa Eldin (2021). Moreover, the author used the help of ChatGPT to rephrase some parts of this paper to avoid any overlap with the said papers. This rephrase was reviewed afterwards to make sure it was accurate and proper.

Conflict of Interests

The author has no conflict of interests to declare.

Funding

The author received no funding for this work.

Ethical Approval

No ethical approval was required for this work.

References

Academy of Scientific Research and Technology. (2016). Egyptian science and technology indicators. http://www.asrt.sci.eg/index.php/indicators-survey
Ahmadi, A. & Meihami, H. (2017). The development of complexity, accuracy, and fluency in ESP learners’ writing: A dynamic systems theory. XLinguae Journal. 10 (3). 57–74. https://doi.org/10.18355/XL.2017.10.03.05
Alaa Eldin, A. (2021). Rhetorical structure challenges facing Egyptian researchers in writing English research articles for international publication. Hermes, 10(2), 9–38. https://doi.org/10.21608/herms.2021.199260
Awwaad, A. (2012). A comparative study on the rhetorical moves of abstracts in published research articles and Egyptian master's theses in applied linguistics (MA thesis). The American University in Cairo, Egypt.
Benzehaf, B. (2016). Development of complexity, accuracy, and fluency in high school students' written foreign language production. Journal of English Education and Linguistics Studies, 3(2), 128­–151. https://media.neliti.com/media/publications/90828-EN-development-of-complexity-accuracy-and-f.pdf
Benzehaf, B. (2017). Comparing learners’ general proficiency levels with their writing productive ability: how correlated are they?. Eurasian Journal of Applied Linguistics, 3(2), 43–58. https://doi.org/10.32601/ejal.460961
Cho, H. (2015). Effects of task complexity on English argumentative writing. English Teaching. 70 (2). 107–131. https://doi.org/10.15858/engtea.70.2.201506.107
Clauses - Grammar - Cambridge Dictionary. (n.d.). https://dictionary.cambridge.org/grammar/british-grammar/clauses
Craven, L. (2017). Measuring language performance: Complexity, accuracy and fluency measures. The 2017 West East Institute International Academic Conference Proceedings. 25–27.
Eassom, H. (2018). 9 common reasons for rejection. Wiley. https://www.wiley.com/network/researchers/submission-and-navigating-peer-review/9-common-reasons-for-rejection
El-Seidi, M. (2006). Syntactic problems in Egyptian-authored English scientific texts. Journal of the Service Center for Research Consulting, Division of Translation, Faculty of Arts, Minufiya University. 2–28.
Hosni H.R. (2015). Metadiscourse in the discussion section of English and Arabic linguistics research articles: A cross-linguistic study [doctoral dissertation]. Fayoum University, Egypt.
Housen, A. & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics. 30 (4). 461–473. https://doi.org/10.1093/applin/amp048
Housen, A., Kuiken, F. & Vedder, I. (2012). Complexity, accuracy and fluency: Definitions, measurement and research. In Housen, A., Kuiken, F. & Vedder, I. (eds.). Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency. 1­–20. https://doi.org/10.1075/lllt.32.01hou
Hunt, K. (1965). Grammatical structures written at three grade levels. NCTE.
Hyland, K. (2011). Academic discourse. In Hyland, K. & Paltridge, B. (eds.). Continuum companion to discourse analysis. 171–184. Continuum.
Jalongo, M. R. & Saracho, O. N. (2016). Writing for publication. Springer International Publishing.
Jebbour M. (2021). Self-disclosure and Moroccan EFL learners’ writing development: effects on complexity, accuracy, and fluency. Journal of Language and Education, 7(1), 127–140. https://jle.hse.ru/article/view/8620
Kuhi, D., Rasuli, M. & Deylamic, Z. (2014). The effect of type of writing on accuracy, fluency and complexity across proficiency. Social and Behavioral Sciences. 98. 1036–1045. https://doi.org/10.1016/j.sbspro.2014.03.514
Lan, M. (2015). The effect of task type on accuracy and complexity in IELTS academic writing. Đhqghn. 31 (1). 45–63.
McDowell, L. (2016). An error analysis of Japanese scientists' research articles [MA thesis]. Macquarie University, Sydney, Australia.
Mohamed, A. M. A. (2020). Meta-discourse use of Egyptian researchers writing English research articles for international publication. Cairo Studies in English, 2020(1), 158–179. https://doi.org/10.21608/cse.2021.147195
Onwuegbuzie, A. J. (2017). Most common formal grammatical errors committed by authors. Journal of Educational Issues. 3 (1). 109–140. https://doi.org/10.5296/jei.v3i1.10839
Polio, C. G. (1997). Measures of linguistic accuracy in second language writing research. Language Learning. 47 (1). 101–143. https://doi.org/10.1111/0023-8333.31997003
Raish, M. (2017). The measurement of the complexity, accuracy, and fluency of written Arabic [doctoral dissertation]. Georgetown University, Washington, DC, USA.
Salem, A. M. A. M. (2020). Linguistic Challenges Facing Egyptian Researchers in Writing English Research Articles for International Publication [Doctoral dissertation]. Cairo University, Giza, Egypt.
Scimago. (2016). Scimago Journal & Country Rank: Egypt. http://www.scimagojr.com/countrysearch.php?country=eg
Shehata, A. M. K. & Eldakar, M. A. M. (2018). Publishing research in the international context: An analysis of Egyptian social sciences scholars’ academic writing behaviour. The Electronic Library. 36 (5). 910–924. https://doi.org/10.1108/EL-01-2017-0005
Skehan, P. (1998). A cognitive approach to language learning. Oxford University Press. https://doi.org/10.1177/003368829802900209
Skehan, P. (2009). Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics. 30 (4). 510–532. https://doi.org/10.1093/applin/amp047
Skehan, P. & Foster, P. (2008). Complexity, accuracy, fluency and lexis in task-based performance: A meta-analysis of the Ealing research. In Van Daele, S., Housen, A., Kuiken, F., Pierrard, M. & Vedder, I. (eds.). Complexity, Accuracy, and Fluency in Second Language Use, Learning & Teaching. 207–226.
Springer. (n.d.). Common reasons for rejection. https://www.springer.com/gp/authors-editors/authorandreviewertutorials/submitting-to-a-journal-and-peer-review/what-is-open-access/10285582
Swales, J. M. (1990). Genre analysis: English in academic and research settings. Cambridge University Press.
Thrower, P. (2012). Eight reasons I rejected your article. Elsevier. https://www.elsevier.com/connect/8-reasons-i-rejected-your-article
Wolfe-Quintero, K., Inagaki, S., & Kim, H. (1998). Second Language Development in Writing. Second Language Teaching & Curriculum Center, University of Hawaii at Manoa.