Register effects and the Spanish adjectival construction sin + INF in historical corpus data

This study analyzes the usage of the Spanish adjectival sin + infinitive verb construction (un libro sin terminar ‘an unfinished book’) and the no + past participle construction (un libro no terminado ‘an unfinished book’) in historical corpus data, with the objective of quantitatively assessing Pountain’s (1993) analysis of the emergence of adjectival sin + INF as motivated in part by register formality. A logistic regression analysis finds that the usage of adjectival sin + INF over no + PP is significantly favored by the text register of Prose Fiction, and significantly disfavored by the register of Legal Texts. Furthermore, this preference increases with time in the register of Prose Fiction. These findings support Pountain’s claim, in showing that register effects significantly influence the usage of the novel construction. Ultimately, this study stresses the importance of measuring register effects in the analysis of language change in corpus data.


Introduction
Negation continues to generate interest in linguistic research, especially with respect to its diachrony and the role it plays in discussions of language change (see Horn 1989, Hansen and Visconti 2014, Van der Auwera 2010. In this paper, we measure the diachronic usage of two analytic negative adjectival constructions in Spanish: sin 'without' + infinitive verb (sin + INF) and no 'no' + past participle (no + PP). Specifically, we use the Corpus diacrónico del español (CORDE) to test the proposal of Pountain (1993), who argued that the emergence of adjectival sin + INF was motivated by a predisposition for no + PP to appear in more formal linguistic registers. In total, this paper takes a quantitative approach to the study of historical linguistics, in testing claims regarding language change using and measuring evidence found in corpus data (cf. Jenset and McGillivray 2017).

The adjectival sin + INF construction
In Spanish, the preposition sin can be placed before an infinitive verb to create a prepositional phrase denoting the negation of a particular action, as in (1a). Casalicchio (2019) describes this sin + INF construction as a prepositional infinitive that can provide the negation of a gerund. Additionally, the sin + INF construction can be used to describe a noun that is interpreted as the direct object of the infinitive verb, as shown in (1b) and (1c).
Una casa sin pintar 'An unpainted house' c. La casa está sin pintar. 'The house is unpainted.' As shown in (1b), the placement of the infinitive verb pintar 'to paint' after the preposition sin indicates its negation. As seen in (1c), it is also possible to find this construction as a complement to a verb, such as the copula estar 'to be', where it plays the same role. In the three examples, there is an observable coreference relationship that links the sin + INF construction and a particular noun. Importantly, though, (1a) differs from (1b) and (1c) because in (1a) we do not interpret the noun Juan as being the direct object of the verb comer 'to eat', whereas in (1b) and (1c) we interpret the noun casa 'house' as being the direct object of the infinitive verb. In cases such as (1b) and (1c), we shall refer to the sin + INF construction as an adjectival sin + INF construction, following analyses of this construction as a kind of adjective. Indeed, as pointed out in the grammar of the Real Academia Española (RAE 2009), it can alternate with the no + PP construction, as shown in (2). 1 (2) a. un libro sin terminar 'an unfinished book' b. un libro no terminado 'an unfinished book' Example (2b) illustrates how the Spanish negator no can be placed in front of a past participle to negate the past participle and describe a noun. In both cases (2a) and (2b), the negative constructions act as adjectives that indicate that a given action has not taken place. Thus, rather than considering these cases of sin + INF to be prepositional phrases, Bosque (1990) analyzes them as a kind of prefixed adjective, where sin plays the role of a prefix indicating the negation of a participle, and where the noun libro 'book' acts as the internal argument of the verb terminar 'to finish'. In this sense, Bosque (1990) notes that an important feature of categorizing the adjectival sin + INF construction as an adjective is that it helps to explain the appearance of these transitive verbs without any kind of direct object in their traditional positions. Among Bosque's arguments that sin is best analyzed as a kind of prefix in this construction, we find that (a) sin already exists as a prefix in lexical items such as sinvergüenza 'scoundrel' or sinrazón 'injustice', (b) in the adjectival sin + INF construction intervening material is not found between sin and the infinitive verb that follows as it, as in *sin siquiera secar 'not even dried', and (c) the infinitive verb does not take complements, as in *La botella está aún sin llenar de vino 'The bottle is still unfilled with wine'. Bosque (1990) also argues that the infinitive verbs in the adjectival sin + INF construction must indicate a change of state. In other words, while the examples of sin + INF in (3a) below contain change of state verbs and are therefore well-formed, those in (3b) do not contain such verbs and thus are rejected by Bosque.
(3) a. Un camión sin llenar 'An unfilled truck' Ropa sin secar 'Undried clothes' Uvas sin madurar 'Unripe grapes' Cuerdas sin tensar 'Untaut ropes' b. *Gente sin respetar 'Unrespected people' *Acciones bancarias sin tener 'Unheld bank stocks' *Un cuadro sin mirar 'An unseen portrait' *Un coche nuevo sin ponderar 'An unpraised new car' (from Bosque 1990: 197) 2   1 An anonymous reviewer points out that not all sin + INF and no + PP alternates seem to be acceptable: un camión sin llenar vs. ? un camión no llenado, for example. Thus, we refer to no + PP as a possible alternative to adjectival sin + INF, rather than a concrete equivalent. 2 As an anonymous reviewer points out, certain well-defined contexts may indeed render these phrases acceptable to some speakers. Bosque (2014) later refers to telicity as the feature of these infinitive verbs that allows them to be used in the adjectival sin + INF construction. In both studies, Bosque argues that verbs that do not indicate a change of state, or are atelic, can still be forced into a telic meaning and be permitted in the adjectival sin + INF construction. Such examples are in (4) below.
Museos sin visitar 'Unvisited museums' (from Bosque 2014: 57) In (4a) and (4b) above, while proteger 'to protect' and visitar 'to visit' do not typically indicate changes of state nor telicity, they still seem to fit the adjectival sin + INF pattern if they are understood as taking on such an aspectual class within a certain context. To summarize this descriptive overview of the adjectival sin + INF construction, we can highlight the following: (a) adjectival sin + INF can appear as an attribute to a noun or as a complement to a verb, (b) it exercises the function of an adjective, and (c) while it is generally used with change of state verbs it can also appear with other verbs that can assume a change of state interpretation.

A diachronic approach to adjectival sin + INF
In terms of diachronic analysis, Schulte (2007) reports that sin is actually the first preposition to combine with infinitive verbs to form prepositional infinitive constructions, with examples of the sin + INF construction appearing in textual data at the beginning of the 13 th century and rising considerably in frequency between 1400 and 1600, before increasing at a more gradual pace into the 19 th century and then falling in usage in the 20 th century. Regarding the first appearances of the adjectival use of sin + INF in text data, the earliest examples that Pountain cites are 16 th century tokens found in Boyd-Bowman's (1971) Léxico hispanoamericano del siglo XVI, and an 18 th century token located in the 1771 edition of the grammar of the Real Academia Española. The 16 th century tokens, with year and location metadata copied from Harris-Northall and Nitti's (2007) digital edition of Boyd-Bowman's work 3 , are shown below in (5a) and (5b), and the 18 th century token in (5c).
(5) a. pieças labradas de piedra con su corredor sin cubrir 'rooms built of stone with their corridor uncovered' (1551, Puebla) b. están sin gratificación y la tierra sin conquistar '(they) are without gratification and the land unconquered' (1580, Chile) c. la obra está sin acabar 'the work is unfinished' Pountain's (1993) paper is the only to offer a proposal with respect to the origin of the adjectival use of sin + INF. To explain the emergence of this construction, Pountain points to the historical development of the range of possible complements of the verb estar 'to be'. Pountain argues that chronologically, estar accepted prepositional phrases before later accepting past participles. Thus, as estar accepted prepositional phrases, it would have formed constructions with the preposition sin, and constructions with the active value of sin + INF. Then, as estar began to take past participles as complements, it would have been inclined to produce the adjectival use of sin + INF, as adjectival sin + INF is semantically equivalent to a negated past participle. Once adjectival sin + INF came into use following the verb estar, a natural next step would be the elision of the verb estar to form noun + sin + INF constructions such as those in example (1b). In short, we schematize Pountain's claim regarding the development of adjectival sin + INF with examples in (6)  An important question that Pountain aims to answer is why adjectival sin + INF emerged when other constructions that expressed adjectival negation already existed. Thus, they compare adjectival sin + INF to other negative adjectival constructions, which include the prefixes des-(desfavorable 'unfavorable') and in-(incontaminado 'uncontaminated'), and the analytic constructions poco + adjective (poco seguro 'not very safe') and no + adjective (no terminado 'unfinished'). Pountain claims that the use of sin + INF as an adjectival construction provided a solution to several issues associated with the alternatives mentioned above. First, with respect to the synthetic forms, they argue that the in-prefix declined in productivity and did not frequently attach to past participles, instead preferring adjective formation with suffixes such as -able or -ible. Furthermore, while the desprefix is relatively productive, it can convey a privative or reversative meaning rather than being strictly negative. As far as the analytic constructions are concerned, Pountain notes that the poco construction does not always convey only negation when combined with past participles, in that poco at its base is a degree adverb and, rather, indicates the negation of muy 'very'. Finally, Pountain argues that the no + adjective construction is most utilized in registers of high formality. Pountain writes that "In neutral register, Traiga una ensalada sin aderezar would be preferred to Traiga una ensalada no aderezada 'Bring a salad without dressing'." (Pountain 1993: 112) An additional syntactic restraint is that it is unacceptable for no + adjective constructions to follow the verb estar, as shown in (7). (7) *El libro está no terminado 'The book is unfinished' Because of these limitations, Pountain argues that adjectival sin + INF may have developed to satisfy a need not met by the alternative constructions. 4 In summary, the main points regarding the emergence of adjectival sin + INF found in Pountain (1993) that must be highlighted are the following: (a) adjectival sin + INF is productive as an attribute to a noun and as a complement to the verb estar, which makes it more syntactically available than no + adjective constructions, which are unable to follow the verb estar, and (b) Pountain argues that no + adjective is limited to registers of greater formality, which may have contributed to the adjectival usage of sin + INF. Following this proposal, it would seem that adjectival sin + INF benefited from both intra-and extralinguistic factors working in favor of its emergence. 5

Register and historical linguistics
In this section we discuss the notions of register, which as mentioned above, play an important role in the diachronic analysis of adjectival sin + INF. An issue in historical linguistic research is that terms such as 'register', 'genre', and several others are sometimes used interchangeably. Furthermore, Kytö (2019) points out that studies in the Romance tradition generally do not use the term 'register' at all. Here we briefly distinguish register and genre as follows. Biber and Conrad (2019: 5) define register as "a variety associated with a particular situation of use (including particular communicative purposes)" and emphasize that certain registers typically exhibit frequently occurring linguistic features because they are well-suited to the function, or purpose, of that register. Genres, on the other hand, are text groupings based upon the structural conventions that certain texts follow. Formal letters, for example, viewed as genres, are expected to include structural elements such as date, greeting, and signature. Importantly, genres are seen as being subordinate to register (Biber and Conrad 2019, Dorgeloh and Wanner 2010, Kytö 2019, in that genre is viewed as a relatively more specialized approach while register is applicable to all text varieties. Thus, regarding the present study, we choose to conduct our analysis in terms of register, being the broader term, and aim to avoid any potential confusion in terminologies. Register plays a critical role in analyzing and explaining linguistic variation in corpora (Biber 2012, Biber et al. 2016, Goulart et al. 2020. The research literature in historical linguistics also supports the importance of register, although individual studies may vary with respect to terminologies used and approaches taken (see Kytö 2019 for a review). Biber and Gray (2013), for example, showed how differing subregisters are a mediating factor for measuring grammatical change in news reportage and academic research and asserted that "Historical analyses that disregard these differences would confound the description of linguistic change with patterns that in 4 The alternative constructions discussed by Pountain are certainly not an exhaustive list of alternatives to adjectival sin + INF. Indeed, as suggested by an anonymous reviewer, a natural alternative may be one that includes a noun: Traiga una ensalada sin aderezar vs. Traiga una ensalada sin aderezos, for example. 5 An anonymous reviewer suggests that an additional possible motivation for the emergence of adjectival sin + INF is the fact that Latin constructions esse + sine + nominal 'to be without + nominal' to express an adjectival function are well attested. fact reflect register differences" (Biber and Gray 2013: 130). In this section we will review recent corpus-based studies in historical language change, with an eye on their methodological approach to register, as they will bear certain influence on the methodology of the present study.
Recently, Yáñez-Bouza (2014) examined corpus data to show how prescriptivist writings of the 18 th century affected the usage of preposition stranding and pied-piping constructions in English. Essentially, Yáñez-Bouza made and defended decisions with respect to where to place certain registers along a continuum of formality. Their continuum is copied below in Table 1. As shown in Table 1, Yáñez-Bouza created a continuum with informal and formal settings situated at either pole. They classified the registers of their corpus data, then, as being of a speech-related or a writing-related medium, which can be seen vertically oriented. Within the speech-related medium, there is a split between registers that are speech-like and those that are speech-purposed. Plays and sermons, for example, are speech-purposed, in that texts belonging to these registers have the purpose of being read aloud, often to establish a kind of dialogue between the reader and audience. However, they are placed on opposite ends of the formality continuum. Private letters and diaries, in turn, were placed on the informal end of the continuum, legal texts on the opposite formal end, and journals in between, since journals are informal in that they are like diaries, but the topics narrated in these journals tended to report non-private matters such as journeys or war campaigns. The results of this study indicated that from the years 1500 to 1900, preposition stranding occurred more frequently in informal than in formal registers. Private letters, diaries, and play texts, which are more closely associated with informal spoken registers, were more likely to show preposition stranding. More formal text registers, such as sermons, scientific texts, medical texts, legal texts, and official letters were less likely to predict uses of preposition stranding. Legal texts, in particular, showed the lowest frequency of preposition stranding. In short, Yáñez-Bouza established a continuum of registers, using the registers available in their corpus, in order to assess the degree of formality of different registers and their effect on the usage of certain linguistic constructions.
In studies of Spanish diachrony, the effect of textual varieties in corpus analysis has been examined in several studies within the paradigm known as "discourse traditions" (Koch 1997, Oesterreicher 1997, Kabatek 2005, Kabatek 2008), which approaches the production of an utterance as being influenced by previously produced utterances within the tradition of a certain discourse. Within this line of research, for example, Rosemeyer (2015) studied differences in auxiliary selection (ser vs. haber) among three different discourse traditions in 16 th century Spanish texts: historiographical texts, administrative documents, and private letters. They found that in private letters, the use of change-of-location predicates significantly predicts the selection of the ser auxiliary more than change-of-state predicates, a result not found in historiographical texts nor administrative documents. This finding highlights the notion that private letters are more likely to contain references to change-of-location events, where the use of the declining ser auxiliary had been most significantly maintained.
Recent studies in Spanish diachrony have included register, or some approach to text variety, in multi-factor regression analyses of historical Spanish grammatical change. Blas Arroyo and Schulte (2017), for example, examined the history of Spanish modal periphrases using data from a corpus of 16 th and 18 th century personal correspondence, following the work of Koch and Oesterreicher (1985), Oesterreicher (1996), andOesterreicher (2004) who argued that personal letters are a textual resource of high communicative proximity and are thus more likely to display features of oral language. Their study measured the use of the modal constructions haber de + INF, deber (de) + INF, and tener que + INF, with one of the extralinguistic factors coded as formality, based on the main topic of the written letters and the closeness of the relationship between the sender and the addressee. The study found that between the 16 th and 18 th centuries, haber de + INF lowers in usage frequency relative to the other two variants. Interestingly, however, they found that in both time periods, less formal contexts that would be more representative of oral language favored the use of the older variant, haber de + INF. The authors argued that this result is due to an entrenchment effect, wherein haber de + INF was a highly frequent construction and therefore more resistant to change. Rosemeyer (2017) analyzed the diachrony of the constructions deber de + INF and deber + INF 'to must + INF' in three distinct time periods of the GRADIA corpus, a multi-genre historical corpus of Spanish data. The author divided their analysis into three time periods: Old Spanish (1200-1499), Renaissance Spanish (1500-1699), and Modern Spanish , and then established a binary variable for formality, coding tokens as originating from genres classified either as formal or informal. In their regression analysis they found informal texts to be a significant predictor of deber de + INF in the Renaissance Spanish and Modern Spanish periods, and that the epistemic usage of deber de + INF (where it expresses a modal of possibility 'might' rather than one of necessity 'must') emerged in the Renaissance Spanish period in informal texts. In the Modern Spanish period, however, they found that formal texts were a significant predictor of the epistemic usage of deber de + INF. They explained these findings in arguing that a change from below occurred, wherein deber de + INF first began to exhibit its epistemic modality in informal texts, which later became solidified by prescriptive pressures in Modern Spanish formal text genres.
Lastly, Rosemeyer and Garachana (2019) examined the diachrony of the lograr/conseguir + INF 'to manage to INF' constructions from the years 1700 to 2005 in the Corpus del nuevo diccionario histórico del español (CDH), a corpus of historical Spanish data, and included a factor for género 'genre' in their regression analyses. In their analyses, they grouped the CDH's genre labels into broader groupings to facilitate their analysis. One of these analyses compared the usage during this time period of the conseguir + INF construction to the usage of the lograr + INF construction, and found that the Historiography genre was significantly less likely to show uses of conseguir + INF. However, possible reasons as to why this particular genre favored the use of conseguir + INF were not discussed, as the overall focus of the analysis seemed to lie more heavily upon syntactic and discourse-level factors.
In summary, corpus-based research in historical linguistics has underlined the role of text varieties in measuring and explaining language change. However, it is important to note that different studies take different approaches to how terms such as register and genre are used and operationalized. In particular, a difficult task for historical linguists has been showing how registers correspond to poles of formality and informality; in the studies reviewed above there are approaches that operationalize register formality along a continuum and there are those that operationalize it in binary terms. The approach to register taken in the present study will be explained in the sections that follow.

Research question
In the previous sections we discussed the adjectival sin + INF construction and Pountain's claim that it emerged as an analytic alternative to other forms of negative adjectival constructions. Critically, however, one must note that the previous literature on adjectival sin + INF lacks any data-based substantiation of such claims. That is, while it is claimed in Pountain (1993) that factors involving register may have played a role in the emergence of the adjectival sin + INF construction, little data is provided that would support such a proposal. Thus, the present study asks the following research question: Is the use of the adjectival sin + INF construction in corpus data modulated by differences in register?
To answer this research question, we will measure the usage of two constructions in a historical Spanish corpus: the adjectival sin + INF construction and the no + PP construction. The no + PP construction makes a good candidate for this study because, as noted by the RAE (2009), it can alternate with the sin + INF construction, and furthermore, it was one of the negative adjectival variants discussed in Pountain's (1993) study. The rest of the paper is organized as follows. In Section 5, we discuss our methodological approach. In Section 6, we present the results of our analysis, and in Section 7 we discuss these results in relation to the research question at hand. Lastly, in Section 8 we conclude the paper and offer additional comments regarding the limitations of this study and suggestions for future research.

CORDE
The present study utilized the Corpus diacrónico del español (CORDE). CORDE is a relatively large corpus of mostly written Spanish historical data. According to the CORDE website, CORDE contains a total of 236,709,914 words. The CORDE query site allows users to conduct searches for strings of words, with possible filter criteria such as author, date, document title, medium, geographic origin, and topic. 6 We limited our analysis to data in the Peninsular Spanish subsection of CORDE (applying the España filter to the geographic origin field), since the large majority of the data (196,106,277 words, or 84% of the total word count) come from documents of Peninsular Spanish origin. No other limitations were applied to the other filters. In total, CORDE contains data from the 8 th century to the 20 th century. The documents contained in CORDE are tagged with a date referring to the document's date of publication, which is metadata we included in our dataset. In some cases, the date given in CORDE is an approximation between two dates (i.e., 1800-1810). In these cases, we took the first date given (1800, for example) to be coded as the date in our dataset. In the following sections more information will be provided with respect to how CORDE was utilized.

Constructions searched: adjectival sin + INF
Because CORDE is not part-of-speech tagged, strings were searched using the CORDE query site in order to capture tokens of the adjectival sin + INF construction. Specifically, we queried the word sin followed immediately by a word ending in *er, *ar, and *ir or *ír, which are the possible endings of infinitive verbs in Spanish. An asterisk denotes a wild card, such that *er will return all words ending with the string er. 7 We then used a free web-scraping application known as Data Miner 8 to copy the search results from the CORDE website into .csv files. Search results from the CORDE website take the form of a list of tokens. Each token contains the queried string, approximately 7 to 8 words of the text to the right and left of the queried string, and the metadata regarding the document in which the token was found.
Once we had these tokens saved into .csv files, we used a Python script to extract only certain tokens from these lists. These were tokens in which the word sin was located within two words to the right of a definite or indefinite article with no intervening punctuation marks, and where the word ending in *er, *ar, or *ir/*ír appeared at the end of a phrase. This was done to specifically locate examples of sin + INF that served as adjectival complements to a noun, and that were not followed by additional material such as complements. To illustrate, some hypothetical examples of tokens that would have resulted from this search are given below in (8). 6 The use of CORDE is not at all uncontroversial. As clearly expounded in Rodríguez Molina and Octavio de Toledo y Huerta (2017), CORDE suffers from a host of issues including but not limited to dating inaccuracy, cataloging errors, duplicate texts, and unclear criteria used for the selection of text samples. We recognize these shortcomings and sustain that CORDE is one of the most conveniently accessible Spanish historical corpora from which large amounts of data can be freely extracted for analysis. An anonymous reviewer also points to the comparability problem inherent in corpus analysis. That is, a historical corpus must be both diverse and uniform at the same time, in order to allow for a wide range of time periods and registers that are evenly distributed and thus comparable. A potential solution to this paradox is the use of parallel corpora, as effectuated in Rosemeyer and Enrique-Arias (2016 The examples in (8) contain subscripted numbers to indicate the distance between a definite or indefinite article and the word sin, which was limited to two words. In (8a), for example, the word sin is found one word to the right of the indefinite article una, and in (8b), the word sin is found two words to the right of the definite article las. All examples (8a) -(8c) are found at the end of a phrase. The definite or indefinite articles searched were specifically: el, la, los, las, un, una, unos, unas, al, and del. We define a word that appears at the 'end of a phrase' as a word that appears directly before the punctuation marks of period, comma, question mark, exclamation point, closing parenthesis, quotation mark, colon, and semi-colon. In short, this Python script was designed to only extract sin within close vicinity of definite or indefinite articles, in order to target the use of sin + INF as adjectival complements of noun phrases, and to find the use of sin + INF at the end of phrases, where they would likely be, following Bosque (1990) in that the construction is not modified by complements. Admittedly, this method does not capture all instances of adjectival sin + INF, given that nouns can appear without articles or at distances from sin greater than two words. However, this procedure was executed in order to find the clearest examples of adjectival sin + INF in a relatively large corpus that is not POS-tagged.
One of the issues with this query process is that while it captures strings such as sin pintar, it also gives results such as sin azúcar 'without sugar', a sin + noun construction, because searching for sin followed by the string *ar does not distinguish between verbs and nouns. It would also produce results with sin + INF that have active values, such as those in (1a), rather than adjectival sin + INF constructions. For this reason, hand-coding was necessary at this point of the process. After running the Python script to limit our dataset to sin within two words of articles and at the end of phrases, the data were then hand-coded to identify the adjectival sin + INF construction, according to the following conditions: the construction sin + INF was coded as being adjectival if the infinitive verb was transitive, and the noun to its left was interpreted as its direct object, as in (9a) and (9b), or if the infinitive verb was an intransitive unaccusative, and the noun to its left was interpreted as its subject, as in (9c)  To briefly summarize the process of locating adjectival sin + INF in CORDE, from start to finish: (1) we scraped all tokens of sin + *ar, *er, and *ir/*ír from the CORDE website and saved them to a .csv file, (2) we used a Python script to extract only tokens where sin was at most two words to the right of a definite or indefinite article and the infinitive verb was at the end of a phrase, and (3) we took these extracted tokens and hand-coded them to locate and verify examples of adjectival sin + INF that took as their internal arguments the nouns to their immediate left.

Constructions searched: no + PP
To locate the no + PP construction, we followed a similar process to the one outlined in the previous section. We began by querying the CORDE website for all tokens of no followed immediately by *ado and *ido, which are the regular past participle endings. Feminine and plural forms and irregular past participles were also searched. Once again, a Python script was used to take these results and limit them to tokens where no was located within two words to the right of the same definite and indefinite articles used in the search for adjectival sin + INF, with no intervening punctuation marks. Tokens of no + PP were also limited to those that appeared at the end of a phrase, following the same definition for the end of a phrase as above. Because this query returns results that do not actually contain a past participle, the resulting tokens were then hand-coded to verify examples of no + PP. Specifically, these constructions were coded as being no + PP if the word following no was indeed a past participle originating from some verb, and if the construction as a whole modified the noun located to its immediate left. Examples of tokens that resulted from this process are shown in (10). Example (10a) shows the use of no + PP within one word of the indefinite article un and example (10b) shows no + PP within two words of the definite article las. Both examples appear at the end of phrases. To briefly summarize the data collection process for the no + PP construction: (1) we scraped all tokens of no + past participle endings from CORDE and saved them to a .csv file, (2) we used a Python script to extract only tokens where no was at most two words to the right of a definite or indefinite article and where the past participle appeared at the end of a phrase, and (3) we took these extracted tokens and hand-coded them to locate and verify tokens of no + PP modifying some noun located to its immediate left.

Additional classifications: Intervening verb
The above sections 5.2 and 5.3 describe the process of locating the tokens of adjectival sin + INF and no + PP to be compared in this study. These tokens were also coded for the following three factors: intervening verb, verb class, and register. In this subsection we describe the process of coding tokens for an intervening verb. The adjectival sin + INF and no + PP tokens collected were binarily hand-coded for whether or not they appeared as a complement to some verb whose subject was the noun that appeared in the construction. We call this verb an 'intervening verb', in that they were only coded when they appeared between the noun and the sin + INF or no + PP construction. The most frequent example of this intervening verb was the verb estar 'to be', but other examples included quedar 'to remain', and seguir 'to continue'. The tokens in (11) exemplify cases in the dataset of adjectival sin + INF with intervening verbs.
La cama estaba sin hacer, todavía con la huella del cuerpo en las sábanas, arrugadas y ligeramente sucias. 'The bed was unmade, still with the imprint of the body on the sheets, wrinkled and lightly dirty.' (Aldecoa, Ignacio 1954; El fulgor y la sangre) b.
...hasta el punto que algunos años las aceitunas quedaron sin recoger. '...until the point where some years the olives remained unpicked.' (Villalonga, Lorenzo 1956; Bearn, o la sala de las muñecas) As shown in (11a), the verb estar intervenes between the noun cama 'bed' and the adjectival sin + INF construction, and in (11b), the verb quedar intervenes between the noun aceitunas 'olives' and the adjectival sin + INF construction. In both examples, the adjectival sin + INF construction is a complement to the intervening verbs. These tokens, then, were coded as containing intervening verbs. If adjectival sin + INF appeared directly to the right of the noun without any intervening verb, it was coded as not containing an intervening verb. The previously presented examples in (9) and (10), for example, would be coded as not containing an intervening verb.

Additional classifications: Verb class
In this subsection we describe the process of coding the tokens for verb class. As noted by Bosque (1990), adjectival sin + INF occurs with verbs that indicate a change of state. It was therefore necessary to identify the aspectual class of the verbs used in the adjectival sin + INF and no + PP constructions found in this study. We therefore classified each infinitive verb that appeared in the adjectival sin + INF construction tokens, and the infinitive verb from which the past participle was derived in the no + PP construction tokens. In order to use a marginally less subjective and more replicable method of classifying verbs than hand-coding each verb, we used the doctoral thesis of Sánchez Marco (2012: 189-194), who classified a relatively large number of verbs according to their aspectual class. Thus, Sánchez Marco's categories of "Accomplishments", "Achievements", "Degree Achievements", "Extent predicates", "Location and locatum verbs", and "Other change of state and change of location verbs" were considered to be change of state verbs, following that these verbs were the ones most likely to indicate a change of state. Sánchez Marco's aspectual classes of "States" and "Object experiencer psychological verbs" were considered to be not change of state verbs, following that these verbs were the ones least likely to indicate a change of state. Verbs produced in the present dataset that did not fall into one of these two lists were coded as "Unclassified". Thus, in total, there were three factor levels in coding verb class: "Change of State", "Not Change of State", and "Unclassified".

Additional classifications: Register
In this subsection we explain how CORDE organizes its data into different text varieties, and how we chose to incorporate those varieties into our analysis. CORDE classifies its documents into a total of thirteen main temas 'topics', and at times, more specific sub-temas. For example, Miguel Delibes' Cinco horas con Mario is classified under the main tema of Narrativa (Prosa) 'Narrative Prose', and also under the sub-tema of Novela y otras formas similares 'Novel and other similar forms'. In the present study we took a methodological approach similar to that of Yáñez-Bouza (2014) and Rosemeyer and Garachana (2019). That is, (1) the thirteen CORDE temas were reclassified into larger register groupings and (2) these larger register groupings were placed on a continuum of formality. These measures were executed in order to achieve a more balanced dataset (some temas were not represented well enough in the corpus to warrant their own groupings) and thus draw meaningful statistical conclusions. This approach is described as follows.
We compiled CORDE's thirteen main temas into seven larger registers, which we call: Prose Fiction, Verse Fiction, Non-Fiction (Other), Scientific Texts, Religious Texts, Legal Texts, and Historiography. In Table 2, the first column shows the thirteen main temas given in the CORDE database, with their names in Spanish as they appear in CORDE, and the author of this paper's English translations. The second column shows how these thirteen temas were collapsed into the seven registers used for this study. As shown in Table 2 above, the decisions with respect to compilation were made largely based on topic. For example, the three prose-based fiction registers in CORDE were condensed into 'Prose Fiction', based on their shared overall register of Fiction. In a similar vein, the verse-based fiction groups were also combined under the register of 'Verse Fiction'. The CORDE groupings of 'Science and Technology', 'Religion', 'Law', and 'History and Documents' were each thought to represent comparatively unique enough topics to merit their individual registers. The 'Didactic', 'Society', and 'Press' texts were combined into the register group 'Non-Fiction (Other)'. To elaborate on the kind of texts that would appear in these topics, the 'Didactic' topic is a largely academic register containing academic essays such as Unamuno's En torno al casticismo, transcribed speeches such as Discurso de recepción en la Real Academia de Medicina, and textbooks. The 'Society' topic covers texts on art, music, sociology, and travel, among a variety of general cultural themes. Some of these texts are academic in nature, such as Martín Gaite's Usos amorosos del dieciocho en España, which originated from her doctoral thesis, whereas others were directed to a wider public, such as Ortega's cookbook 1080 Recetas de cocina. The 'Press' topic contains various press texts such as articles from the Spanish newspaper ABC.
In order to relate these registers to scales of formality and oral language, we followed an approach similar to that of Yáñez-Bouza (2014), in establishing the continuum of formality which will be utilized in the present study, shown below in Figure 1. "Prose poetry", as it is known in English, is prose writing that maintains certain poetic qualities.
The following assumptions were made in the continuum shown in Figure 1. First, we followed Short (1996: 91) in assuming that "writers often create special effects by writing in ways which borrow characteristics associated with speech". The register most likely to do so is Prose Fiction, wherein characters of novels and plays are given textual dialogue authored to borrow from speech. Thus, we placed Prose Fiction towards the informal end of the continuum. 10 We placed the registers of Scientific Texts, Legal Texts, Religious Texts, and Historiography towards the formal end, in assuming that the overall character of these texts is monologic, conservative, and more inclined to use archaic linguistic features. In between these two poles, we placed the registers of Non-Fiction (Other) and Verse Fiction. We argue that these two registers trend towards informality, in that Non-Fiction (Other) contains newspaper texts and writings of interest to a wider readership, and Verse Fiction contains plays and narrative texts more representative of oral speech. At the same time, these registers also trend towards formality, in that Non-Fiction (Other) includes texts for learned audiences, and Verse Fiction contains poetry texts that are structured in ways that do not reflect informal spoken language.

Descriptive data
We first present a descriptive overview of the dataset that resulted from the data collection process outlined in Section 5. This dataset contains a total of 1,159 tokens, 240 of which contain the adjectival sin + INF construction and 919 of which contain the no + PP construction. The adjectival sin + INF construction was therefore the less frequently occurring construction of the two, accounting for 20.7% of the total dataset. Below, this dataset is presented according to chronological date in Section 6.1.1, according to intervening verb status in Section 6.1.2, according to verb class in Section 6.1.3, and according to register in Section 6.1.4. A mixed-effects logistic regression then analyzes this dataset in Section 6.2. Table 3 below outlines the token counts of the two constructions in the dataset, divided by century. In addition, the token frequencies of the single words sin and no from each century, found by searching the terms sin and no in the CORDE query site, are also included in Table 3 under the column headings of "Tokens sin" and "Tokens no", in order to give some context with respect to the relative frequency of the target constructions. These relative frequencies appear under the column headings of "sin + INF per 1,000" and "no + PP per 1,000". The relative frequency of adjectival sin + INF, for example, is the token count of adjectival sin + INF for every 1,000 tokens of the word sin found in CORDE. Similarly, the relative 10 It is not our aim to say that all Prose Fiction is informal. Indeed, we only defend that Prose Fiction is more likely to reflect informal patterns of language when compared to the other registers measured here. frequency of no + PP is the token count of no + PP for every 1,000 tokens of the word no found in CORDE. 11 Indeed, a novel finding of the present study is that adjectival sin + INF appears as early as the 15 th century in these Peninsular Spanish data. In the present dataset, the earliest two examples of adjectival sin + INF to appear are the following. In other words, it seems that adjectival sin + INF has existed in Spanish since at least the 15 th century, somewhat earlier than suggested in Pountain (1993). The relative frequency data in Table 3 is plotted in Figure 2 below, where century groupings are plotted on the x-axis, and the relative frequency rates of the two constructions are plotted on the y-axis. 11 It is not possible to retrieve total word counts for chronological subsections of CORDE. We thus use relative frequency as shown in Table 3, as we were able to retrieve overall token counts of sin and no and apply the chronological filter to limit data to certain centuries. Since sin and no are relatively frequent words in Spanish, we see them as relatively reliable proxies for overall word count. As visualized in Figure 2, adjectival sin + INF appears in CORDE in the 15 th century, but does not ellipse no + PP in relative frequency until after the 18 th century. In other words, after the 18 th century, adjectival sin + INF appears more frequently relative to the lone word sin than no + PP relative to the lone word no. Despite this, the no + PP construction does not fall into disuse as the language enters its contemporary era.

Organization of data by intervening verb
As discussed in Section 5.4 above, tokens were coded for whether or not the adjectival sin + INF and no + PP constructions appeared as complements to some verb that intervened directly between the noun and the respective construction. Table  4 below shows the number of tokens collected per construction, organized according to whether or not they appeared as complements to an intervening verb. As shown in Table 4, intervening verbs were rare, appearing in 34 total tokens of the 1,159 tokens in the dataset. When they appeared, they occurred primarily with the adjectival sin + INF construction. Examples of such tokens can be seen above in (11a) and (11b).

Organization of data by verb class
As discussed in Section 5.5 above, each token in the present dataset was associated with a verb which was coded as being "Change of State", "Not Change of State", or "Unclassified", according to the aspectual classifications of verbs compiled by Sánchez Marco (2012). Table 5 below summarizes how the tokens in this dataset were coded according to verb class. As Table 5 shows, adjectival sin + INF is more likely to appear with verbs classified as Change of State verbs than verbs classified as Not Change of State verbs. Table 6 below outlines the number of tokens collected per construction and per register grouping, following the register groupings outlined in Section 5.6. As shown in Table 6, of the seven registers, adjectival sin + INF tokens are most likely to occur in the register of Prose Fiction, and least likely to occur in the register of Legal Texts.

Logistic regression analysis
While the descriptive data presented in the sections above give an overview of the dataset, further analysis is necessary to weigh these different factors together and measure which conditions favor the usage of the construction under study. We begin this analysis by recalling the research question, which sought to find out whether the use of adjectival sin + INF over no + PP is modulated by differences in register. To answer this question, this dataset was analyzed using a general mixed-effects logistic regression (glmer) in the lme4 package in R (Bates et al. 2015). The 1,159 individual tokens that form the dataset were counted as individual observations. The model was built as follows. The production of adjectival sin + INF versus no + PP was coded as a binary dependent variable, where a token with adjectival sin + INF was coded as 1 and a token with no + PP was coded as 0. This is to say that we aim to measure the probability that adjectival sin + INF will be chosen over no + PP. Fixed effects were the date of the document in which the token appeared (coded as a centered continuous variable), its intervening verb status (coded as "True" for containing an intervening verb or "False" for not containing an intervening verb), its verb class (coded as "Change of State", "Not Change of State", or "Unclassified"), and its register (coded as one of the seven registers outlined in Section 5.6). A random effect was included for the title of the document in which the token was located. Sum treatment contrasts were applied to the factor group of verb class (with baseline level as "Unclassified") and the factor group of register (with baseline level as "Non-Fiction (Other)"). Since the intervening verb status factor group only contained two levels, default treatment contrasts were applied, with the intervening verb status of "False" used as the baseline level. The resulting regression model was calculated by entering the following formula in R: glmer(sininf ~ date*verbclass + date*intervening + date*register + (1|document), control=glmerControl(optimizer='bobyqa'), family = binomial) The above formula, written in R syntax, takes the effects outlined above and produces a regression model in which the dependent variable of adjectival sin + INF (labeled as "sininf") is regressed on the independent variables of chronological date (labeled as "date"), intervening verb status (labeled as "intervening"), verb class (labeled as "verbclass"), register (labeled as "register"), and a random variable of the document in which the token appeared (labeled as "document"). Interactions were measured between date and each of the other independent variables. In sum, this formula shows that the effects of chronological date, intervening verb status, verb class, register, and document are being weighed together in assessing the probability that a particular observation will contain the adjectival sin + INF construction, and not the no + PP construction. The output of this regression analysis is summarized in Table 7 below, with statistically significant effects (p < 0.05) in bold. As shown in Table 7, the effect of the Change of State verb class was a significant positive predictor of adjectival sin + INF relative to no + PP, and the Not Change of State verb class was a significant negative predictor of adjectival sin + INF relative to no + PP. The presence of an intervening verb was also a significant positive predictor of adjectival sin + INF. In terms of register, the Prose Fiction register was a significant positive predictor of adjectival sin + INF, while the Legal Texts register was a significant negative predictor of adjectival sin + INF. Examining the interactions between independent variables, there was a significant interaction of date and the intervening verb status, such that the presence of an intervening verb becomes increasingly likely to predict adjectival sin + INF as chronological date increases. There was also a significant interaction between date and the Prose Fiction register, such that the Prose Fiction register is increasingly likely to predict adjectival sin + INF with increases in date. Additionally, there was a significant interaction between date and the Legal Texts register, such that the Legal Texts registers is increasingly unlikely to predict adjectival sin + INF with increases in date. To visualize the interaction between date and register graphically, Figure 3 below plots mean-centered date on the x-axis, the probability of adjectival sin + INF over no + PP on the y-axis, according to the moderators of register represented by the different colored lines.  Figure 3 illustrates how the moderators of the Prose Fiction and Legal Texts registers, represented by the purple and green lines, widely differ in slope as date increases, in comparison to the lines that represent the moderators of the other registers. This is to say that with increasing chronological date, the register of Prose Fiction becomes increasingly more likely to contain examples of adjectival sin + INF, relative to no + PP, and the register of Legal Texts becomes increasingly more likely to contain examples of no + PP relative to adjectival sin + INF. The results presented in this section and their relevance to the research question will be discussed in the following section.

Discussion
This study set out to identify whether differences in textual formality, operationalized in terms of register differences, would modulate the usage of adjectival sin + INF versus no + PP in Spanish historical corpus data. The results show that adjectival sin + INF was indeed preferred to no + PP in Prose Fiction texts, which in Figure 1 we classified as being relatively more representative of informal speech patterns. Additionally, the register of Legal Texts, classified in Figure 1 as a register most representative of formal speech patterns, significantly disfavors the use of adjectival sin + INF, in favor of no + PP. Furthermore, the results show that as time went on, these particular effects became increasingly pronounced. Therefore, this quantitative analysis offers support for Pountain's (1993) proposal, which suggested that the propensity of the no + PP construction to appear in formal registers may have influenced the rise of the adjectival sin + INF construction. Apart from the influence of register, the results also confirmed certain linguistic assumptions regarding the usage of the adjectival sin + INF construction. In particular, the data showed a very strong preference for adjectival sin + INF over no + PP when used with Change of State verbs, which reflects Bosque's (1990Bosque's ( , 2014 description of the adjectival sin + INF construction as only appearing with such verbs. Furthermore, the data also showed a preference for intervening verbs to appear with the adjectival sin + INF construction, which supports Pountain's analysis of adjectival sin + INF as having originated in an estar sin + INF construction which later led to a noun + sin + INF construction. This result also supports the notion that the adjectival sin + INF construction increased in usage not only because no + PP was more limited to formal registers, but also because it was more limited to certain syntactic contexts. As noted in Pountain (1993), of the two constructions, adjectival sin + INF has a wider distribution, in that it can be used as both an attribute to a noun and a complement to a predicate. The no + PP construction, on the other hand, seems to staunchly prefer being placed as an attribute to a noun.
A novel finding in these results is that the chronological date, as a main effect alone, did not predict increases in adjectival sin + INF relative to no + PP. This seems to reflect the descriptive data in Section 6.1.1, which show that although adjectival sin + INF greatly increases in relative frequency as Spanish moves towards the 20 th century, no + PP does not completely disappear from usage. Thus, it is not entirely accurate to say that adjectival sin + INF simply increased in frequency relative to no + PP only as a result of increased time. As the interaction effect shows, chronological date does matter, but specifically within the registers of Prose Fiction and Legal Texts. In other words, it is more accurate to describe increases in adjectival sin + INF as being modulated by increases in time, but only within a certain register of the language. That is, adjectival sin + INF increased in usage relative to no + PP as time went on, but only in registers of the language more representative of informal speech patterns.
Importantly, then, we must ask ourselves why adjectival sin + INF was preferred in more informal text registers. Is there some property of adjectival sin + INF that seems to make it more "informal" than no + PP? Indeed, the aforementioned limitation of no + PP to a smaller syntactic distribution may offer some clues. That is, the significant increase in frequency of adjectival sin + INF in informal text registers, compared to other registers, may actually be a reflection of the syntactic reality that no + PP was unable to be selected in as many sentence structures as adjectival sin + INF. Since adjectival sin + INF was available to follow both nouns and verbs, but no + PP was only available to follow nouns, it may be the case that over time adjectival sin + INF increasingly became the preferred variant in either syntactic position, encroaching on contexts previously occupied by no + PP. This change would have been most pronounced in informal texts, which are more likely to represent au courant speech patterns, and less pronounced in formal texts that conserved more archaic patterns of language. We leave this particular point of interest open to scrutiny in future research studies. To close this discussion, we highlight the importance of measuring register in corpus studies of historical linguistics, in order to capture the most accurate picture of diachronic change.

Conclusion
The primary goal of this study was to provide a quantitative test of a proposal regarding the usage of the adjectival sin + INF construction. The results provide evidence to support the proposal of Pountain (1993), whose analysis described adjectival sin + INF as a potential response to the unavailability of alternative negative adjectival constructions. In particular, the usage of adjectival sin + INF relative to no + PP in historical corpus data is significantly influenced by register formality. In this sense, that no + PP was restricted to more formal registers may have indeed provided the motivation for the increasing frequency of an adjectival use of the already existing sin + INF construction.
This study is not without certain limitations. One of the key limitations of the present study is that it did not measure the entire range of Spanish negative adjectival constructions, which would include prefixes such as in-and des-. Additionally, this study did not aim to ascertain the genesis of the adjectival use of the sin + INF construction, or how the prepositional infinitive sin + INF adopted a new adjectival function. Rather, this study only focused on analyzing the increasing usage of this construction in relation to register categories of textual data. We believe that an analysis of the precise origin of adjectival sin + INF would require a discussion of the typological implications of the construction and its counterparts in other Romance languages spoken in the Iberian Peninsula and beyond, in addition to the potential geographic variation that exists among different regions of the world where Spanish is spoken. It is interesting to note that equivalent adjectival constructions descendent from Latin SINE and infinitives are not available in certain other Romance languages such as French. A potential piece to this entire puzzle, as an anonymous reviewer points out, is the fact that Spanish past participles and infinitives find their etymological origin in Latin perfect passive participles and present active infinitives, which differ in voice (passive versus active). This central voice distinction was not explored in the present study and should be discussed in future studies of the constructions under question. In addition, in classifying the semantic nature of verbs we relied on classifications already rendered by Sánchez Marco their study. While we chose to do this in order for this research to be carried out similarly by other researchers, it did indeed leave many verbs under the "Unclassified" grouping. Future studies may wish to consider the advantages and disadvantages of hand-coding each verb for aspectual verb class. Finally, the data analyzed in this study was limited to written data. Future corpus research on this topic may very well uncover novel findings in oral data.
In all, we conclude by noting that the results of the present study reflect the opinions of Biber and Gray (2013), who argued for the importance of considering register in historical linguistic analyses. Indeed, without having measured the effects of register, the diachronic usage of adjectival sin + INF would have been left only partially explained. We thus reassert that future investigations of linguistic phenomena in corpus data must make efforts to account for register differences in their analyses.