Lexical bundles in learner and expert academic writing

: Lexical bundles (LBs) have been described as the ‘building blocks of discourse’; in addition to being highly frequent in writing and reducing processing time for readers and writers, they also perform important functions in language. LB choice, however, can vary according to genre, discipline, and different sections of the same text, which poses a challenge for novice L2 writers. This paper explores the use of LBs in a learner corpus of bachelor dissertations written in English by Spanish L1 students in linguistics and medicine, and compares it with published research articles in the same disciplines. By focusing on the introduction and conclusion sections, we identify the most frequent 3-, 4- and 5-word bundles in the corpora, to later study their types, structures, and functions. The results show differences in the use of LBs across disciplines, genres and sections, suggesting pedagogical implications for the inclusion of LBs in the L2 writing curriculum. fins i tot diferents seccions del mateix text, plantejant un desafiament per a estudiants novells. El present article explora l'ús de LBs en un corpus de treballs de fi de grau escrits en anglès per estudiants espanyols de lingüística i medicina. Aquest corpus es compara amb articles de recerca publicats en les mateixes disciplines. Centrant-nos en les seccions d'introducció i conclusió, identifiquem els LBs de 3, 4 i 5 paraules més freqüents en el corpus, per a després estudiar els seus tipus, estructures i funcions retòriques. Els resultats mostren diferències en l'ús de LBs entre disciplines, gèneres i seccions, suggerint implicacions pedagògiques per a la seva inclusió en l'ensenyament de l’escriptura acadèmica en anglès. The present study aims to further the understanding of phraseology in learner writing by exploring the use of LBs in the introduction and conclusion sections of bachelor dissertations (BDs) written in English by Spanish L1 university students in linguistics and medicine. In order to compare the frequency of form, structure, and function of these bundles, an expert corpus of research articles (RAs) in the same disciplines is used as the reference corpus. The comparisons will be made from both a quantitative point of view –applying a corpus-driven approach to identify bundles in the learner and the expert corpus– and a qualitative approach –classifying the bundles structurally and functionally in both corpora. This study hopes to contribute to the body of research that studies phraseology in academic writing, and to serve as a useful pedagogical resource for L2 learners of English who are trying to accommodate to the conventions of these specific disciplines.


Introduction
Over the last few decades, numerous corpus analyses have brought to the fore the fact that language is highly patterned (Hunston, 2002;Römer, 2010;Sinclair, 2005). Sequences such as additional information or is one of the main, especially common in particular registers, are 'ready to use' chunks, "stored and retrieved whole [s] from memory at the time of use" (Wray, 2002, p. 9) rather than generated item-by-item. These pre-fabricated units have been shown to facilitate production for authors and also save processing effort for readers and listeners (Nattinger & DeCarrico, 1992).
Lexical bundles (henceforth LBs) were first identified by Biber and colleagues  and have been defined as "the most frequently recurring sequence of words" (Biber & Barbieri, 2007, p. 264), as well as "important building blocks of discourse" (p. 270). The identification of LBs in corpus studies has been primarily based on corpus-driven approaches of frequency and range, following the pioneering lexical bundle approach developed by Biber, Conrad, and Reppen (1999). In order to qualify as a lexical bundle, a sequence needs an occurrence of at least 20 or 40 times per million words (Biber & Barbieri, 2007;Chen & Baker, 2010;Cortes, 2004). Range of dispersion (i.e. the number of texts in which the bundle appears) is normally set at 3 or 5 texts or 10% of the texts in the corpus (Hyland, 2008). This criterion is used to guard "against idiosyncratic uses by individual speakers or authors" (Biber & Barbieri, 2007, p. 268).
The present study aims to further the understanding of phraseology in learner writing by exploring the use of LBs in the introduction and conclusion sections of bachelor dissertations (BDs) written in English by Spanish L1 university students in linguistics and medicine. In order to compare the frequency of form, structure, and function of these bundles, an expert corpus of research articles (RAs) in the same disciplines is used as the reference corpus. The comparisons will be made from both a quantitative point of view -applying a corpus-driven approach to identify bundles in the learner and the expert corpus-and a qualitative approach -classifying the bundles structurally and functionally in both corpora.
This study hopes to contribute to the body of research that studies phraseology in academic writing, and to serve as a useful pedagogical resource for L2 learners of English who are trying to accommodate to the conventions of these specific disciplines.
One recurrent finding is that English L2 writers' use of LBs does not always approximate the use by expert or native writers in terms of frequency, form, and function. For example, the masters and PhD candidates' writings explored in Hyland (2008) seemed to contain more impersonal clusters (i.e. avoiding stance), and more clusters in general compared to RA writers. The author suggests that less proficient writers rely on word combinations more often than expert writers. This finding contrasts with Durrant and Mathews-Aydınlı's (2011) study, in which student essays showed a lower production of formulas compared to RAs; differences regarding functional moves were also found. The authors suggest that the lack of attention paid to different genres and disciplines in academic writing education may account for these differences.
Another interesting finding in the literature in relation to our study is English L1 students' greater and more varied use of LBs, especially in structures such as unattended this, existential there, hedging and negations, as compared to that of L2 university students, whose texts contained learner writing characteristic features, such as anticipatory it which, coupled with some informal lexical choices (e.g. it is easy to), pointed at register difficulties (see Ädel & Erman, 2012). In terms of functionality, L1 writers used stance more frequently than L2 writers. Interestingly, stance is one of the functions that differed the most among RA writers of the different languages (Spanish L1, English L2, and English L1) and disciplines studied in Pérez-Llantada (2014) and in Sheldon (2018): English L2 writers were found to transfer some of their L1 (Spanish) rhetorical practices into their L2 writing, which made their texts less interactional.
In order to investigate the use of LBs by Spanish L1 undergraduate learners writing in English in two different disciplines (i.e. linguistics and medicine) and sections (i.e. introduction and conclusion) in comparison with their expert-writer counterparts, three research questions were established in this study: 1. What are the most common lexical bundles in the introduction and conclusion sections of L2 learners' BDs in linguistics and medicine?
2. How are these lexical bundles used in terms of structure and function?
3. To what extent does the use of lexical bundles approximate or differ from published RAs in the same discipline?

Data collection
In order to carry out a quantitative and qualitative analysis of LBs in academic writing, two corpora were compiled: (1) a learner corpus of BDs in linguistics and medicine written in English by Spanish L1 undergraduates in their last year of studies, and (2) an expert corpus of RAs in the same disciplines published in English-medium and peer-reviewed academic journals 2 . The introduction and the conclusion sections of each text were extracted and saved as raw .txt files for their separate analysis. Table 1 describes the number of texts, tokens, types, and paragraphs per genre, discipline and section.

Extraction, filtering, and classification of lexical bundles
In the present study, a corpus-driven approach was adopted in order to retrieve LBs from the corpora -i.e. no previous assumptions were made with respect to the LBs' form or function, and no pre-defined list of bundles was used. The function 'cluster n-gram' in AntConc (Anthony, 2018) was used to extract LBs from the introduction and conclusion sections of the corpora. In terms of length, even though the 4-word scope is the most researched length in LB studies (Ädel & Erman, 2012), other studies suggest that many recurrent word combinations come in as 3-word bundles (Simpson-Vlach & Ellis, 2010); as a result, we decided to adopt a more inclusive approach and explore 3-, 4-and 5-word bundles in the texts. As for frequency, given the relatively small size of the corpora, the frequency cut-off was set at a minimum of 20 times per million words. In addition, a dispersion range of three texts, which represent three different writers, was set; the selection of these cut-off criteria was based on previous corpus studies (Ädel & Erman, 2012;Biber & Barbieri, 2007;Chen & Baker, 2010). It is important to note that when a bundle appears only on one of the lists, it does not mean that this specific bundle was not used at all by writers in the other subcorpora; as Ädel and Erman aptly put it, "it simply means that the frequency and dispersion criteria were not met in the other group's material" (2012, p. 85 With regards to the grammatical structure of LBs, we initially followed Biber et al.'s ( , pp. 1014Biber et al.'s ( -1024 classification, which distinguishes 12 structural categories for LBs in academic prose. After revising this and the taxonomy they provide for conversation, we present a taxonomy of 15 categories with four broad structural groups: 'noun phrase-based', 'prepositional phrase-based', 'verbal phrase-based', and 'other' bundles, following Chen and Baker (2010, p. 34), which can best integrate the LBs found in our data. The NP-based bundles include noun phrases, with or without post-modifier fragments (e.g. the risk of, the most prevalent). PP-based bundles refer to those starting with a preposition plus a nounphrase fragment (e.g. of this paper, in addition to For the functional classification, on the other hand, we followed previous taxonomies (Biber, Conrad, & Cortes 2004;Cortes 2004;Hyland 2008) and classified all bundles into three main categories and their subcategories: 1) Research-oriented -also called referential in other models (e.g. : LBs in this category help writers to situate, contextualize and describe their research. There are four main subcategories: 1) location (e.g. at the beginning, at the university), 2) procedure (e.g. the use of the, the purpose of), 3) quantification (e.g. a part of, one of the most), and 4) description (e.g. the size of the, the nature of the).
2) Text-oriented -also called discourse organizers : these LBs are concerned with the structure of the text and the interrelations established between the ideas presented. There are four main subcategories: 1) transitions (e.g. on the other hand, in contrast to the), 2) resultative (e.g. as a result, due to the fact that), 3) structuring (e.g. in the next section, in this study), and 4) framing (e.g. with respect to, in the case of). 3) Participant-oriented: LBs in this category show writers' attitudes towards the ideational content and address readers directly or indirectly. It comprises two main categories: 1) stance (e.g. may be due to, are likely to), and 2) engagement (e.g. as can be seen, it should be noted).
This functional classification was complex not only because the categorization involves subjectivity, but also because some LBs can perform more than one function (Liu, 2012). A concordance analysis was performed in order to see the extended context of certain bundles that seemed multifunctional. For example, the basis of is a 3-word bundle that can act as a research-oriented descriptive bundle, as in (1) (1) Findings from such a study can form the basis of learner-relevant form-focused instruction. (LIN_RA01_I) 1 But, when this sequence is part of the 4-word bundle on the basis of, it can mark a textoriented resultative relationship, as in (2) (2) Other linguistic accounts differentiate the two forms on the basis of information status, particularly in terms of topic. (LIN_RA15_I) For those cases in which the authors could not agree on the categorization, even after analyzing their extended context, previous literature that included examples on LBs and their functional categories was consulted (Cortes, 2004;Hyland, 2008;Pérez-Llantada, 2014).
These structural and functional classifications allowed us to better understand the use of LBs in the corpora studied.

Results and discussion
The results of the analysis of LBs are reported on as follows. First, the most frequent LBs in the introduction and conclusion sections of BDs and RAs in medicine and linguistics are explored. Convergent bundles (i.e. those bundles that appear on more than one list) are then presented. Finally, a second and more qualitative analysis of the structures and functions of bundles is presented, exploring the similarities and differences found in the corpora.

Frequency and convergence of lexical bundles in the corpus
There are a total of 218 different bundles in the corpus as a whole (for the full list, see Appendix 1) with a total frequency of 1,151 hits, which represents around 4.5% of the tokens in the corpus. The most frequent bundle is the use of with a raw frequency of 85 counts, which equals more than 1000 times per million words (pmw) in our corpus. Moreover, the use of appears in all genres and disciplines explored in this study, so it could be regarded as a core or convergent bundle, following Pérez-Llantada's (2014) nomenclature. It is noteworthy to mention that the use of appears in the conclusion section of the corpora 50 out of 85 times, clearly indicating a preference for the last sections of a text. RAs in linguistics (37) and in medicine (21) are the genres that contain more hits of the use of, very often paired with other nouns (questions, tools, English, other alternatives, somatic stem cells). This bundle seems to help writers to display results, as in (3) or limitations, as in (4).
(3) Trends for the social science fields indicate a reduction in the use of these informal features. (LIN_RA04_C) (4) Another limitation was the use of asymptomatic microembolic signals as a surrogate marker. (MED_RA02_C) The second most frequent bundle in the corpus is in order to, with a raw frequency of 62 counts, i.e. about 750 pmw. By contrast to the use of, this bundle appeared in the introduction sections of the texts more often, in particular, 39 out of 62 times. Taking into account the total number of words in each corpus, BDs in linguistics show a predominant use of this bundle (22 raw hits) followed by RAs in linguistics (24), BDs in medicine (12), and medical RAs (6). Different procedure verbs such as address, determine, provide, show, solve, facilitate, and gain are used after this bundle. In order to can help writers to emphasize the study's main objective or justification, as in (5) and (6) respectively.
(5) This study aims to analyse comprehension and production of false friends in students of English in a C1 level classroom in order to explore the influence of their mother tongue (L1) on a second language (L2). (LIN_BD10_I) The third most frequent bundle is yet another core bundle present in all subcorpora: as well as (43 hits).
As well as appears more frequently in the introduction sections (24 times), and rather than just adding new information, this bundle helps writers to focalize and frame the ideas presented, as in (7) and (8): (7) FN is a dimeric glycoprotein that is found in plasma as well as in the extracellular matrix (ECM) of various tissues (MED_RA03_I) (8) Conclusions will be drawn to justify the analyzed usages of discursive strategies as well as the historical and social consequences that can derive from them.

(LIN_BD02_I)
The use of, in order to and as well as are also included on   'formulas worth teaching' (ranking 29, 4 and 5 respectively), which underlines their pedagogic relevance.
In terms of length, 3-word bundles were the most frequent in the corpus (85.7% of the total bundles), while 4-and, especially, 5-word bundles were scarcely used (10.2% and 3.9% respectively). This finding was similarly reported on in previous studies, such as Biber et al.'s (1999, p. 994), who found that 3-word bundles were much more frequent in academic prose (over 60,000 times pmw) than 4-word bundles (which occur over 5,000 pmw).
If we look at each subcorpus separately, in particular, we will find some interesting patterns. As can be seen in Table 2, BDs in medicine and linguistics have produced almost the same quantity of LBs in the introduction and conclusion sections (conclusions were a bit shorter in this genre compared to the introduction, which partially explains why they contain half the amount of LBs as introductions); this seems to point at a shared quantitative feature in the use of LBs between texts of two different disciplines but that belong to the same genre tokens for both introduction and conclusion sections, articles in linguistics contain almost three times more LBs than medical articles.  (24) 4-w (5) 5-w (4) 3-w (30) 4-w (2) 5-w (1) 3-w (125) 4-w (17) 5-w (5) 3-w (38) 4-w (2) 5-w (0) *all values are raw counts This finding has been supported by previous literature on LBs in academic writing across disciplines (Hyland, 2008;Liu, 2012) and points towards a disciplinary difference: research suggests that soft-knowledge disciplines very often emphasize interpretative language in order to present persuasive arguments, compared to hard-knowledge disciplines, that tend to be more impersonal in their methods and discussions. The linguistic items that allow writers to achieve this objective are, more often than not, part of recurrent word combinations (e.g. it is important to, has the potential to, it can be argued that, are likely to, seems to be, it should be, needs to be), which can explain the prominent LB occurrences in linguistic RAs. Hyland (2008) reported that less mature writers had used LBs more often.
This finding contrasts with our results, but only for one of the two disciplines: BDs in medicine do contain more LBs than RAs in the same discipline (3.3 vs. 1.6 bundles on average per text); particular characteristics of the BD genre with regards to its audience -for example, that of being an academic final assignment in which students need to show and convince their supervisors (as a superior entity) that they have acquired certain knowledgecan contrast with published RAs in which authors present information to peers (of more or less the same expertise) and could account for this quantitative difference.
Adopting another perspective, the comparison of all LBs lists has yielded an inventory of 35 shared bundles. Some of these bundles are shared in the introduction and conclusion section of the same subcorpus, but some are also shared between genres (BDs, RAs), disciplines (linguistics, medicine), and some of them appear in all lists, regardless of their inventory "might indicate that the writers have memorized these language sequences and routinized them in their writing practices". Table 3 shows convergent bundles in the corpora: If we look at specific bundles, as previously mentioned, the use of (85 hits), in order to (62) and as well as (43) are core bundles shared across all corpora in our study. Hyland (2008, p. 12) found a total of 5 core bundles across four disciplines (on the other hand, as well as the, in the case of, at the same time, and the results of the), which is somewhat similar to our results. In terms of bundles that appear in both the introduction and conclusion section of BDs and RAs, there are a total of 23 different bundles, 19 of which appear in the introduction and conclusion sections of RAs in linguistics; these items can be a useful resource for L2 writers of academic English. Convergent bundles not only vary in their grammatical structure but also in the discourse functions they perform, as we will see in the next section. Table 4 below shows the frequency of LBs per structure across genres and disciplines, taking the four broad groups and the 15 structural categories into consideration, and provides one illustrative example for each category. An important caveat to understand the discussion of the findings that follows is that the frequencies given refer to the type of bundles used and not to the number of times each bundle type was used (raw frequency). As can be seen, there is a clear prevalence of NP-based bundles over the rest of structural categories in all corpora. This prevalence is especially evident in the expert corpus, in both linguistics and medicine (both with a total frequency of more than 40%), over the second most common group of structures, the VP-based bundles. The PP-based categories rank in the third position in all four subcorpora. It is worth looking at specific rather than general structural categories to obtain a more realistic and clarifying picture of the findings.

Structures and functions of lexical bundles in the corpus
Of all 15 categories, the most common structure overall is the noun phrase with of-phrase, representing in all cases more than 30% of all categories, with the highest frequency in the medicine RAs (35%). In particular, we found a total of 78 bundles with this structure, with a raw frequency of 375 -that is, LBs belonging to this category account for 32% of the total frequency of LBs in the corpus as whole.  indicate that as much as 70% of the most common bundles usually consist of a noun phrase with an of-phrase fragment. The prevalence of this structure has also been found in previous studies on LBs (Chen & Baker, 2010;Hyland, 2008;Liu, 2012). As it could be expected given its high raw frequency, the use of is the most frequent bundle in this category (62 hits), with a higher presence in medicine RAs (21 hits).
Other common examples are one of the (13 hits), the analysis of (the) (11 hits), and the risk of (11 hits). Examples (9), (10), (11) and (12)  The second most common structure is the other prepositional phrase, that is, bundles introduced by a preposition, excluding those with an embedded of-phrase; common LBs in this category are of this paper, according to, in this study, and of the most. We noticed above that LBs tend to be incomplete structural units; when they can be used as potentially complete units, these tend to act as discourse signaling devices (Biber et al., 1999, p. 999 We have already mentioned particular examples of bundles which are especially recurrent in our corpus. One instance is in order to, which we consider a to-clause fragment (rather than a prepositional-phrase pattern; cf. Pérez-Llantada, 2014, for instance), and partly explains the relatively high frequency of the (verb/adjective +) to-clause structural pattern in all subcorpora. In addition, our data show two further common structures of bundles in specific subcorpora. One of them is the passive verb (+ prepositional phrase) with a higher use in the medicine RAs, exemplified by bundles such as is associated with, have been proposed, and can be used to, which interestingly are all found in the conclusion section of these texts. The impersonal nature of the passive construction seems to fit well with the medicine discipline, in which writers allegedly attempt to hide authorial interpretation more than their linguistics counterparts. This finding supports disciplinary differences on structural categories reported on in Hyland (2008, p. 11). The other structural category that shows a higher frequency than in other corresponding subcorpora is the noun phrase + verb phrase in BDs in linguistics. Examples of these bundles are paper aims to, this paper will focus on and this study has. We may hypothesize that this higher use is due to the emphasis placed on these non-agent text subjects in the teaching of academic discourse to university students.
A general tendency emerging from the figures represented in Table 4  bundles has the potential to, play an important role in. Compared with this wide range of bundles, BDs in linguistics exhibit a less illustrative choice, with seven structural categories not represented, which can be explained by the less proficient writing skills of these authors.
In the medicine corpora overall, however, the choice of bundles is definitely less varied.
Curiously enough, medicine RAs show a much lesser degree of variation and representativeness in the use of LB structures, even though they belong to the same genre as their linguistics counterparts. It is difficult to say why this might be, but disciplinary variation and the topic of linguistic articles itself (language) could account for the discrepancies found.
The analysis of LBs according to discourse function has also revealed interesting insights. Table 5 provides an overview of the LB functions across genres and disciplines. As can be seen, bundles with text-oriented functions are prevalent over the other two types in general. The second most common type of bundle are those performing research-oriented functions. The comparison between these two functional categories, however, provides an interesting disciplinary distinction: whereas in linguistics there is a significant difference in frequency between the text-oriented and research-oriented functions in both learners and experts, and a particularly high use of text-oriented bundles (over 50%) in BDs, in medicine, on the other hand, the figures are closer between these two functions, and in medicine BDs they are exactly the same. This is (partly) in line with Hyland (2008, p. 14), who found a greater use of bundles with a referential function in the hard sciences to the same use in the soft-knowledge fields (i.e. linguistics), providing to the former "a greater real-world, laboratory-focused sense to writing", and thus emphasizing the empirical over the interpretative, as seen above. The more evident prevalence of text-oriented bundles in linguistics would also agree with this picture. in other studies that have noted an avoidance of stance bundles in learners in comparison with English L1 authors (see Hyland, 2008, p. 19;Pérez-Llantada 2014, p. 91;Sheldon, 2018, p. 34). Pérez-Llantada (2014) notes that Spanish-speaking learner writers in English avoid personal markers to a greater extent than the corresponding expert writers of academic discourse. Our results also point to a lack of confidence on the part of the linguistics learners to express their stance and subjectivity.
In order to turn now to a more detailed analysis, we present Table 6 below with the figures of bundle types for the specific discourse functions included in each of the broad functional categories just mentioned. As with the discussion of the structure of bundles, a first thing to note is the greater and richer variety of functions in the linguistics RAs, with all ten categories represented in the table, in comparison with the other three subcorpora. Concentrating on the most important functional category, that of text-oriented bundles, we see a clear preference for the structuring type in linguistics, and especially in linguistic BDs. Although the expert writers in medicine also exhibit an important use of this category, their learner counterparts, by contrast, make no use at all of these bundles, clearly preferring bundles with a resultative/inferential function instead, as will be discussed below.
Structuring bundles, having an identifying and focusing meaning, allow writers to draw the reader's attention to a particular idea in the text, and to intensify the force of their arguments.
Linguistics experts have used structuring bundles in their conclusions more often, a practice  (13) and (14), or VP-based structures (aim of this paper is, this paper will focus on, there is a and that they are), as in (15). The word aim, as noun or verb, is a recurrent one in bundles with this function.
(13) The aim of the present paper is to study the preference for the use of one-word verbs to multi-word verbs (LIN_BD09_I) As just mentioned, resultative bundles are fairly common (21.2%) in medical BDs, by comparison with the other three subcorpora (with less than half this frequency), and by contrast, no instance of the structuring function was found. Interestingly, these writers have placed almost all their resultative bundles in the conclusion sections, as illustrated in (16) and (17). Other common bundles with this function are the conclusion that, as a result of, and due to the fact that.
(16) (…) call for the involvement of mental health professionals in the Emergency Room in order to offer a more complete evaluation of patients once medically stabilized. arguments with respect to others may have a genre-specific explanation; academic writing instruction may emphasize this writing strategy over others.
In research-oriented bundles, the second most important functional category, an interesting tendency arises: whereas the medicine data overall favor bundles contributing to the description of research objects, especially in RAs, linguistics favors the procedural bundles. This is not entirely surprising, considering the nature and object of study of each of these academic texts. And thus, whereas in medicine the description of the 'real-world' problem (medical conditions, clinical studies, etc.) is of great importance to their studies, in linguistics texts it is important to show the procedures of the research methods and demonstrate a certain ability in explaining how the research has been conducted. Both functions, i.e. method and procedure, are overwhelmingly often expressed by a NP-based bundle and very frequently by the noun phrase with of-phrase. Common bundles of description from the medicine texts are the prevalence of, the presence of, the risk of, and from the VP-based pattern, it is a/the. To express procedure, the most commonly used bundle is, by far, the use of. Other common bundles expressing procedure are (the) analysis of (the), the role of, the ways that, and from the VP-based group of bundles, can be used to.
Description and procedure bundles are exemplified in (18) and (19)  The final category, participant-oriented bundles, mostly covers stance markers expressing opinion rather than facts, and may indicate degree of probability and epistemic meaning, on the one hand, or be part of the so-called 'other stance markers' (see Cortes, 2004, p. 209), on the other, which include LBs with evidential meaning, indicating the source of the information (e.g. recent studies have, have been proposed). The former type, the most common one, tends to be expressed by a recurrent set of structural categories, namely can also be expressed in other ways than 3-, 4-and 5-word bundles, and that our study refers only to stance expressed in these sequences. Interestingly, stance is more common in the conclusion sections of the BD genre, whereas RAs contain more bundles of this type in their introduction sections: persuading readers from the very beginning through evidential and epistemic bundles seems to characterize more confident writing. Finally, engagement is almost non-existent in our corpus with only one bundle, namely our understanding of, used in the conclusion section of RAs in linguistics.

Conclusion
This paper has analyzed the use of LBs in the introduction and conclusion sections of learner and expert academic writing in linguistics and medicine. The quantitative and qualitative analysis performed in order to explore the frequency, structures and functions of LBs has yielded interesting results: LBs are very useful devices for the construction of discourse, but they behave in dissimilar ways in different disciplines and genres.
Regarding frequency, of the 218 bundles retrieved, 3-word bundles were more frequent in all subcorpora; of these, the use of, in order to, and as well as stand out as the most popular LBs. BDs in linguistics and medicine have produced a similar quantity of LBs in both sections, whereas RAs vastly differ in their frequency of use of LBs, which points towards a disciplinary difference. When comparing the learner and the expert corpus, on average, BDs in medicine contained more LBs than RAs in the same discipline, and the opposite tendency was found for linguistic BDs, which contained fewer LBs than their expert counterparts. In addition, a list of 35 convergent bundles was found, which can be a pedagogically useful resource for general academic writing. This quantitative analysis was complemented by qualitative analyses of structure and function which, after manual classification and revision of concordance lines, provided a more comprehensive picture of LB usage.
In terms of structure, both learner and expert writers favored NP-based bundles; the structure noun-phrase with of-phrase was by far the most frequent one in all corpora. BDs and RAs also agreed on the second most common LB structure: other prepositional phrase, which allowed writers to include frequent discourse signaling devices in their texts. The main difference, however, lies in the greater structural variation of the LBs used by experts in linguistics; LBs in medical RAs, and in the learner genre, were definitely less varied. Finally, with regards to function, LBs performing text-orienting functions were the most prevalent in all subcorpora. The second group, LBs with research-oriented functions, was more popular among medicine expert writers, who seem to emphasize the empirical over the interpretative.
The last function, participant-oriented, was the least represented one; this low frequency is especially marked in BDs in linguistics, which points towards a case of underuse.
Additionally, while learners placed stance markers mostly in the last section of their texts, expert writers showed a preference for the use of stance in their introduction sections.
Placement of LBs in particular sections of a text is yet another important feature that depicts writers' academic literacy. On the other hand, the lack of structuring bundles in medical BDs, and their recurrent use of resultative bundles also calls for explicit pedagogical attention.
Disciplinary differences were also found regarding the prevalence of descriptive bundles in medicine, and of procedural bundles in linguistics; disciplinary conventions and the object of study of each of these texts could account for the discrepancies found.
The present study has some limitations worthy of mention. The first one is a methodological limitation: in order to extract sequences of words automatically, our retrieval method only included LBs that were fixed in nature; that is, our lists do not include variable bundles or bundles with open slots (e.g. in section (…), up to (…) %, to a (…) extent). This method therefore does not capture LBs in their entirety. Including this type of permutations (e.g. using the ConcGram function in Wordsmith tools) could have helped to show a more comprehensive picture of LBs in academic writing (see O'Donnell et al., 2012). Another methodological limitation has to do with the fact that the learner corpus had not been errortagged, which could have somehow affected the number of LBs extracted (i.e. if there were typos in particular words that were part of LBs, the software did not retrieve them). All texts included in the learner corpus, however, were successful BDs evaluated by their supervisors and the evaluating committee, so the probability of containing numerous typos is unlikely.
Using a larger learner corpus would also have made the findings more representative. In addition, our analysis has looked at the use of LBs in the introduction and conclusion section of academic texts, as these sections tend to be the most conventional ones in these particular genres. Analyzing LB positions, not only with regards to sections but also with regards to paragraphs or sentences, would be interesting (see Römer 2010). Finally, when comparing our findings across previous studies that utilized corpora of different lengths and breadths, it was