Translation as a Paradigm Shift : A Corpus Study of Academic Writing

In recent decades the increasing reliance on computer technology and the emergence of electronic publishing have precipitated changes in both the production and reception of academic writing. At the same time, the dominance of English as the medium of academic communication has been asserted in all !elds of study. While many scholars write their own texts in English, it is not exceptional for others to have their papers translated into English. It is interesting, however, that translation of academic discourse has received relatively little research attention so far. In the study presented here, the question how translated academic texts di#er from comparable original English academic texts is addressed. To explore this question, a 700,000–word corpus comprising 104 research articles (Slovene–English translations and comparable English originals) is analyzed in terms of references to the entire text itself. $e results show considerable di#erences between the translated texts and the comparable English–language originals.


Introduction
In recent decades the increasing reliance on computer technology and the emergence of electronic publishing have precipitated changes in both the production and reception of academic writing.At the same time, the dominance of English as the medium of academic communication (see for instance Pérez-Llantada, Plo and Ferguson 2010;Lillis and Curry 2006;Burrough-Boenisch 2006) formerly asserted above all in the natural sciences has spread to the social sciences and, to a somewhat lesser extent, the humanities (see Tardy 2003;Flowerdew 1999).When it comes to the question of non-native English-speaker scholars producing English-language texts, there seems to be a general assumption that they -as a rule -draft their own texts in English.is, however, is certainly not true of all non-native English speaker scholars, since their English-language pro ciency varies to a great extent depending on factors such as cultural and language background, age, and experience.e scholars with a very low pro ciency in English have to resort to translation services to produce acceptable English-language texts, which raises interesting issues concerning this speci c type of translation.It is must be noted, however, that translation of academic discourse has received relatively little attention, with a few notable exceptions (Montgomery 2009;Bennett 2007;Williams 2004) in recent years.(For a more detailed discussion of the relative neglect of translation of academic discourse, see Pisanski Peterlin 2008.) e relative lack of research interest means that issues related directly to the process of translation arising in translation of academic discourse have not been addressed in a systematic way.In order to gain a better insight into the speci cs of translated academic discourse, a paradigm shift in our approach to English-language academic texts by non-native English-speaker scholars is needed: in addition to studying the characteristics of texts written by non-native English speakers in English, characteristics of translated texts should also be examined.
In the study presented here, the question how translated academic discourse di ers from comparable original English academic discourse is addressed using a corpus study.To explore this question, a 700,000-word corpus comprising 104 research articles (Slovene-English translations and comparable English originals) is analyzed in terms of references to the entire text itself.e results show considerable di erences between the translated texts and the comparable English-language originals

Metadiscourse in translated texts
In recent years research has shown that intercultural di erences in writing conventions may impact translations of academic discourse (cf., Pisanski Peterlin 2008, Williams 2004).Interference or transfer, i.e., the impact of the source language or source text on the target text, has long been recognized as an important feature of translation; understanding interference as an inherent part of translation process, Toury (1995, 274-97) posits the law of interference.While interference from the source text may be easy to detect and avoid (if desired) at the level of lexico-grammar, it presents a greater challenge at the level of discourse, because the intercultural di erences in rhetorical conventions may be less obvious or less familiar to translators.
Studies in contrastive rhetoric focusing on academic writing (e.g., Mur-Dueñas 2011;Pisanski Peterlin 2005;Dahl 2004;Čmejrková and Daneš 1997;Mauranen 1993;Hinds 1987) have highlighted metadiscourse (de ned by Hyland 2005, 37) as "the self-re ective expressions used to negotiate interactional meanings in a text, assisting the writer (or speaker) to express a viewpoint and engage with readers as members of a particular community") as a particularly interesting discourse phenomenon because of the considerable cross-cultural di erences in terms of its use.Hinds (1987) proposes a typology based on reader versus writer responsibility, suggesting that languages di er in attributing responsibility for successful communication to either the writer or the reader.In writer-responsible languages, the writer is primarily responsible for presenting the content in such a way that it is easy to interpret for the reader.is generally means that metadiscourse is used extensively to explicitly help the reader interpret the text and to provide explicit text organization.In reader-responsible languages, the reader is primarily responsible for piecing together the intended content.As a rule this implies that explicit writing is not necessarily valued by the audience; consequently, metadiscourse is not used as frequently as in a writerresponsible language.Hinds's (1987) study focused on the di erences between English and Japanese; he suggested that whereas English is writer-responsible, Japanese is reader-responsible.Previous contrastive research into metadiscourse use in Slovene and English (Pisanski Peterlin 2005) suggests that Slovene is somewhat less writer-responsible than English.
Various models of metadiscourse have been proposed in the past; in recent years Hyland's (2005) model and the so-called narrow or "re exive" model developed by Mauranen (1993) and Ädel (2006) have been the two principal models used in analysis of metadiscourse.A functional rather than formal approach is generally used in studies of metadiscourse because metadiscourse items are de ned by their function rather than their form.
However, when it comes to research focusing on translated texts, the impact of translation on discourse phenomena should also be considered.In fact, in some cases discourse-level interference may be revealed by focusing on the form.If the translator is unaware or only vaguely aware of the discourse function of a selected rhetorical element, such as a metadiscourse item, he or she may opt for a translation solution that entails a target text expression formally closely resembling the source text expression.As a consequence, a formal methodological approach may yield interesting results in a comparison of translated texts and comparable target-language originals, and for this type of analysis a corpus study is particularly suitable.e present study is restricted to a small subset of metadiscourse items used to structure the text at the macro level, i.e., items that are used to refer to the entire text itself: article, paper and here.Dahl (2004Dahl ( , 1812) ) highlights the importance of the role of expressions which she labels as "locational metatext" (these include expressions referring to the text itself or part of the text in academic discourse); their function is to help the reader navigate through the text.In a research paper, expressions referring to the text itself help maintain a clear distinction between references to the study or experiment presented in the paper and references to the discourse used to convey that study or experiment.is is illustrated in examples 1-6 below.All the examples cited in this paper are taken from the corpus used in the study presented here.e expressions referring to the text itself or to the study are highlighted in boldface.
(1) e research outlined in this paper will attempt to provide a holistic perspective on one speci c waste management behaviour-household recycling.
(2) In this paper we report on research aimed at developing a set of methods designed to assist road departments in rural jurisdictions mitigate hazards along roads under their management.
(3) In this paper we present an applied geographical analysis of this issue, arguing for a reexamination of the availability and quality of the underlying agrometeorological data that will be available for timely input to these DSS.
(4) e analysis of QDs put forth in this paper contests any single operator to induce QDs in the semantics.
(5) is paper presents the results of a survey of 238 speakers of Slovenian.
(6) In light of the argument that the close relationship between stress and high tone in class (iv) words is indicative of their being part of an accentual system, if there were observable stress e ects on non-class (iv) words, this would pose a problem for the analysis presented here.In Hyland's (2005) typology they would be classi ed as interactive metadiscourse (either as endophoric or frame markers, due to the functional nature of his classi cation).Within the framework of Tuomi's (2009) modi cation of the so-called narrow or "re exive" model of metadiscourse, they correspond to what Tuomi (2009, 68) classi es as metatext of highly explicit re exivity.

Corpus and method
3.1 Corpus e 700,000-word corpus used in the analysis presented here comprises 104 research papers from two disciplines, geography (60 texts) and linguistics (44 texts).All the research papers were published between 2000 and 2007 in peer-reviewed journals indexed in relevant international databases.e corpus is composed of two parts, translated texts (52 research papers originally written in Slovene by native speakers of Slovene and subsequently translated into English by experienced translators, native speakers of either Slovene or English), and comparable originals (52 research papers originally written in English by native speakers of English).To enable a better analysis, the corpus is divided into four subcorpora; the rst subcorpus (EngTranG) comprises English translations of geography research papers originally written in Slovene, the second subcorpus (EngTranL) comprises English translations of linguistics research papers originally written in Slovene, the third subcorpus (EngOrigG) comprises comparable geography research papers originally written in English and the fourth subcorpus (EngOrigL) comprises comparable linguistics research papers originally written in English.In Table 1, an overview of the size of the subcorpora is provided in terms of the number of texts it contains and in terms of the approximate number of words.Table 1: Subcorpora size e data on the corpus size shows that there are considerable di erences between the four subcorpora; because of this, the quantitative results are presented in terms of raw gures, number of occurrences per 10,000 words and mean value per text.

Method
All the texts in the corpus were made electronically accessible and tables, gures, notes, and bibliographical references were excluded from the research.e subcorpora were searched electronically using WordSmith Tools 5.0.e electronic search was followed by a manual examination of the output in which all the non-metadiscoursive instances were removed.e analysis was carried out in three steps.In the rst step, the frequencies of the selected items in the subcorpora were compared in terms of the number of occurrences per 10,000 words and the mean value per text.In the second step, the frequency of the individual lexical items was compared in the four subcorpora.In the third step of the analysis, the collocational patterns of the search words identi ed in the four subcorpora were compared.e collocational patterns were identi ed using the clusters function in WordSmith Tools Concord with three words left and right of the search word and a minimum frequency of at least ve instances.

Results
e results of the corpus analysis are presented in sections 4.1-4.3below, in terms of the overall frequency of references to the entire text itself, the frequency of the individual lexical items in the four subcorpora, and collocational patterns identi ed.In addition to the raw number of items in the rst column, the results are also presented as the number of occurrences per 10,000 words in the second column, while the third column presents the results in terms of the mean value per article.To enable a better comparison, the ratio of lexical items relative to the total number of instances identi ed is presented below in Figure 1; the ratio is presented separately for each subcorpus.In gure 3 below, the clusters identi ed by the clusters function in WordSmith Tools Concord are presented for the EngTranL subcorpus.In the EngTranL subcorpus, clusters were identi ed by WordSmith Tools for the search word "paper", but not for the search words "article" and "here".In gure 4 below, the clusters identi ed by the clusters function in WordSmith Tools Concord are presented for the EngOrigG subcorpus.In the EngOrigG subcorpus, clusters were identi ed by WordSmith Tools for the search word "paper", but not for the search words "article" and "here".In gures 5a and 5b below, the clusters identi ed by the clusters function in WordSmith Tools Concord are presented for the EngOrigL subcorpus.In the EngOrigL subcorpus, clusters were identi ed by WordSmith Tools for the search words "here" and "paper", but not for the search word "article".

Discussion
e results of the analysis presented in the previous section are examined in more detail below, in terms of the overall frequency of references to the entire text itself, the frequency of the individual lexical items in the four subcorpora, and collocational patterns identi ed.

Overall frequency of references to the entire text itself
Because of the considerable di erences in the size of the four subcorpora (cf.Table 1), the overall frequency should be compared in terms of the number of occurrences per 10,000 words or in terms of the mean value per text.e results presented in Table 2 show substantial di erences in the frequency of use of references to the entire text itself between texts which have been translated into English from Slovene (the overall frequency per 10,000 words for both disciplines combined is 5.19) and texts originally written in English (the overall frequency per 10,000 words for both disciplines combined is 10.85).On average, 2.63 instances of references to the entire text itself occur in the translations whereas in the originals, the average number of occurrences is much higher (8.12).seems that in the translated texts less importance is attached to signalling the distinction between the two levels of reality (the experiment/study -vs. the discourse) discussed in the text, and the reader is generally expected to work out which level is referred to on his or her own.Since studies (e.g., Pisanski Peterlin 2005) have shown that Slovene texts tend to attribute more responsibility for e ective communication to the reader (compared to English texts, where this responsibility is generally attributed to the writer, cf.Hinds 1987), the features of translated texts could be the result of interference.It should be noted, however, that the translated texts are English-language texts, yet they seem to follow Slovene writing conventions.
A more in-depth look at the overall frequency shows that there are also interesting di erences between the two disciplines.In terms of the frequency of occurrences per 10,000 words, the gap between the translated and the original geography research papers is more pronounced (4.87 occurrences per 10,000 words in the EngTranG subcorpus as opposed to 10.85 occurrences per 10,000 words in the EngOrigG subcorpus) than the di erence between the translated and the original linguistics research papers (5.58 occurrences per 10,000 words in the EngTranL subcorpus and 8.91 occurrences per 10,000 words in the EngOrigL subcorpus).However, the ratio between translations and originals is practically the same for the two disciplines when it comes to the number of occurrences per text: in both cases, the number of occurrences in the translated texts is just over 30% of the number of occurrences in the originals.

Frequency of lexical items
e results in Table 3 and Figure 1 show that the translations also di er from the comparable originals in terms of preference for individual lexical items.In the translated texts, the word "article" is the preferred word for referring to the entire text itself in the geography texts; in the linguistics texts, it is the second most frequent choice (after "paper"), although the di erence in the frequency of use of the two is very small.On the other hand, in the comparable originals, the word "article" is very rarely used to refer to the entire text itself in both disciplines.is trend once again points to the possibility of interference: since "članek" is used in Slovene to refer to a research paper (as well as a newspaper article), the translators perhaps inadvertently translated this term using what they may perceive as a "standard" translation equivalent, i. e., "article".Examples 7a and 7b illustrate this type of translation solution.e relevant expressions are highlighted in boldface.
(7b) In this article, we describe the 100-meter and 25-meter digital elevation models of Slovenia relative to di erences in surface heights, surface slopes, and surface aspects for all of Slovenia and for four areas with di erent relief.
Another interesting observation can be made about the word "here".Figure 1 suggests that there is a considerable di erence in the use of this lexical item in reference to the entire text itself between the two disciplines in both translated and original texts.In the linguistics papers originally written in English, the word "here" is used in approximately half of the time to refer to the entire text itself; in the translated linguistics texts the preference for "here" is somewhat less pronounced, but it is nevertheless used in 27% of the cases.In the geography papers, the word "here" is used less frequently: in the original geography papers it is used in 20% of the cases, in the translated geography texts, the gure is 15%.e manual "weeding" of the original output clearly showed that the word "here" can be problematic in geography texts in some contexts because of its locative meaning: in geography research papers, locations are frequently described and "here" can be too ambiguous in some cases to clearly indicate whether it is used in reference to the content to describe a location or in reference to the text.Example (8) illustrates the locative meaning of here and example (9) illustrates the metadiscoursive meaning of here.Both examples are taken from the same text.e relevant expressions are highlighted in boldface.
(8) Whereas the seasonal trend in the upper and mid-canopy was towards a more normal distribution in leaf areas, in the lower canopy, the reverse was the case.Here, leaf area distribution became progressively more skewed and by mid-September the range in leaf area was 1.0 13.4 cm 2 with a modal class of 4 5 cm 2 .
(9) Although the pre-leaf value is comparable to the 0.51 recorded in the present study, LAI is double the valuereported here.

Collocational patterns
As Figures 2-5 reveal, the collocational patterns identi ed by the clusters function of WordSmith Tools Concord are almost restricted to the word "paper".ree-word clusters containing the word "paper" that occur at least ve times are identi ed in all four subcorpora.A comparison reveals that only three rather basic collocation patterns can be identi ed in the translated texts.In the translated geography texts, the patterns are "in this paper" and "in the paper", and in the translated linguistics texts, the pattern is "of this paper".In the comparable originals, the variety of clusters is far greater.In the originals from both disciplines, the following clusters, relevant as collocational patterns, can be found: "in this paper", "of this paper", "this paper is", "this paper has", "of the paper".In the original geography research papers, the following relevant clusters should also be mentioned: "[in] this paper we", "[the/this] paper examines the", "this paper focuses", "described in this [paper]", "presented in this [paper]" and "this paper examines".In the linguistics research papers, additional relevant clusters include: "the present paper", "[in] this paper I" and "of the present [paper]".
In addition to the word "paper", the word "article" also generates a clusters list for the EngTranG subcorpus.is is not surprising, given the fact that the word "article" constitutes about one half of all the examples of references to the entire text itself in that subcorpus.e list of clusters for the EngTranG includes: "[in] this article we", "in this article", "the article is", "the article presents" and "of the article".
Similarly, a list of clusters is also generated for the search word "here" in the EngOrigL subcorpus.(Again, "here" accounts for about one half of all the examples of references to the entire text itself in that subcorpus.)e list is, however, limited to the following two relevant collocational patterns: "here is that (the)" and "[PAST PARTICIPLE] here as a".e ndings show that many more collocational patterns emerge in the original texts as compared to the translations, where the list is very limited for the word "paper".Although more diverse collocational patterns can be identi ed for the word "article" in the translated geography texts, it should be remembered that the use of the word "article" is very restricted in the English-original texts.All in all the results seem to indicate that not all of the translators were very familiar with the realization of rhetorical functions at the level of lexico-grammar.

Conclusions
e corpus study presented in this paper addressed the question how academic discourse translated from Slovene into English di ers from comparable original English academic discourse.e study was restricted to a small subset of metadiscourse items used to structure the text at the macro level, i.e., items that are used to refer to the entire text itself: article, paper and here.e analysis revealed important di erences in the frequency of use of the selected metadiscourse items: references to the entire text itself were used far more frequently in the original texts than in the translations.is suggests that the distinction between the references to the study or experiment presented in the paper (content) and references to the discourse used to convey that content was maintained far more consistently in the originals, which might have contributed to greater clarity and coherence.
Furthermore, the analysis revealed considerable di erences in the forms metadiscourse items used in the translations and the comparable originals: it seems very likely that this was a direct consequence of interference.Finally, the analysis also identi ed more diverse collocational patterns in the originals, suggesting that perhaps not all of the translators were su ciently familiar with the realization of rhetorical functions at the level of lexico-grammar.e ndings of the present study raise several important questions for further research.Since it seems that interference was the most prominent factor contributing to the di erences between translations and comparable originals, it seems possible that translators in general are only vaguely aware or even completely unaware of the reader and writer responsibility, and the di erences in this respect between Slovene as a source language and English as a target language.A study focusing on translators' understanding of these issues would shed more light on this matter.e second question that remains open is the question of the translator's options regarding this issue.Even if the translator is fully aware of the di erences between the source and the target language, it seems possible that he or she might be reluctant to insert metadiscourse items.A study focusing on the attitudes of the translators of academic discourse and their potential clients (scholars who commission translations of academic texts into English) would provide more information on whether a target-oriented approach to translation, advocated in the context of translation of academic discourse by Williams (2004), should be followed.Finally, the limited scope of the present study, which focused only on references to the entire text itself, raises the question whether similar patterns can also be observed for other types of metadiscourse items used in structuring the text.A corpus study of related metadiscourse items would provide important additional information on this matter.
Studies focusing on English-language academic texts written by non-native English-speaker scholars have identi ed many features that are not typical of Anglo-American rhetoric.In the case of translated academic texts the situation may be even more complex.A paradigm shift in the approach of academic discourse which incorporates translation as of the ways of producing academic discourse in English is necessary for a better understanding of the characteristics of texts by non-native English-speaker scholars.
As a nal point, some limitation of the present study must also be considered.e corpus used was relatively limited in size due to the small number of translated texts available for analysis.Moreover, the corpus comprises texts from only two disciplines: as research has shown important di erences in the use of hedging among various disciplines (cf.Hyland 2005: 144-147), this certainly limits the scope of the present ndings.

Figure 1 :Figure 2a :Figure 2b :
Figure 1: Percentage of lexical items4.3Clusters as potential collocational patternsIn Figures2a and 2bbelow, the clusters identi ed by the clusters function in WordSmith Tools Concord are presented for the EngTranG subcorpus.In the EngTranG subcorpus, clusters were identi ed by WordSmith Tools for the search words "article" and "paper", but not for the search word "here".N Cluster Freq.Length 2 THIS ARTICLE WE 7 3 3 IN THIS ARTICLE 7 3 4 THE ARTICLE IS 6 3 5 THE ARTICLE PRESENTS 5 3 6 OF THE ARTICLE 5 3

Figure 3 :
Figure 3: EngTranL clusters for the search word "paper"

Figure 4 :
Figure 4: EngOrigG clusters for the search word "paper"

Figure 5b :
Figure 5a: EngOrigL clusters for the search word "here" N Cluster Freq.Length 1 IN THIS PAPER 21 3 2 OF THIS PAPER 18 3 3 THE PRESENT PAPER 15 3 4 THIS PAPER IS 12 3 5 OF THE PAPER 8 3 6 THIS PAPER I 7 3 7 THIS PAPER HAS 6 3 8 OF THE PRESENT 6 3

Table 2
presents the overall frequency of references to the entire text itself in the four subcorpora.

Table 3
presents the frequency of the three lexical items used as search words per 10,000 words.e results are presented separately for each of the subcorpora.