VeLeCa

A verbal lexicon of Catalan with PCFP analysis

Authors

Abstract

This paper presents VeLeCa, a new resource on Catalan verbal inflection containing the phonological form of 174,200 word forms from 3,484 lexemes and their respective lexical and morphosyntactic values and frequencies. We describe the challenges and procedure we followed in the compilation and phonemization of this resource, and conduct a computational analysis of the Paradigm Cell Filling Problem (i.e. morphological predictive complexity) in the system to contrast it with those from related Romance languages.

Keywords

Catalan, verb, paradigm, PCFP, morphology

References

Ackerman, Farrell, James P. Blevins, & Robert Malouf. 2009. Parts and wholes: Patterns of relatedness in complex morphological systems and why they matter. In J. P. Blevins, & J. Blevins (eds.), Analogy in grammar: Form and acquisition: 54-82. Oxford: Oxford University Press.

Ackerman, Farrell, & Robert Malouf. 2013. Morphological organization: The low conditional entropy conjecture. Language 89(3): 429–464.

Badal, Manuel. 2024. El proceso de velarización de los participios de la segunda conjugación del catalán: Un ejemplo de analogía retrasada. Verba: Anuario Galego de Filoloxía 51: 1-21.

Batsuren, Khuyagbaatar, Goldman, Omer, Salam, Khalifa, Habash, Nizar, Kieraś, Witold, Bella, Gábor, Leonard, Brian et al. 2022. UniMorph 4.0: universal morphology. https://doi.org/10.48550/arXiv.2205.03608

Beniamine, Sacha. 2018. Classifications flexionnelles. Étude quantitative des structures de paradigmes. Ph.D. dissertation, University Paris Diderot.

Beniamine, Sacha, Bonami, Olivier, & Ana R. Luís. 2021. The fine implicative structure of European Portuguese conjugation. Isogloss. Open Journal of Romance Linguistics 7: 1–35.

Blevins, James P., Milin, Petar & Michael Ramscar. 2017. The Zipfian paradigm cell filling problem. In K. Ferenc, J. P. Blevins, & H. Bartos (eds.), Perspectives on Morphological Organization, 139-158. Leiden: Brill.

Boada, Roger, Guasch, Marc, Haro, Juan, Demestre, Josep, & Pilar Ferré. 2020. SUBTLEX-CAT: Subtitle word frequencies and contextual diversity for Catalan. Behavior Research Methods 52: 360-375.

Boleda, Gemma, Bott, Stefan, Meza, Rodrigo, Castillo, Carlos, Badia, Toni, & Vicente López. 2006. CUCWeb: a Catalan corpus built from the Web. In Proceedings of the 2nd International Workshop on Web as Corpus.

Bonami, Olivier, Caron, Gauthier, & Clément Plancq 2014. Construction d’un lexique flexionnel phonétisé libre du français. SHS Web of Conferences 8: 2583–2596. EDP Sciences.

Bou, Joan S. 2020. Language corpora. In J. A. Argenter, & J. Lüdtke (eds.), Manual of Catalan Linguistics, 421–440. Berlin: Walter de Gruyter.

Bybee, Joan. 1995. Regular morphology and the lexicon. Language and Cognitive Processes 10(5): 425–55.

Carbonell, Joan, & Joaquim Llisterri. 1999. Catalan. Handbook of the International Phonetic Association.

Cathcart, Chundra A. 2018. Modeling linguistic evolution: A look under the hood. Linguistics Vanguard 4(1): 20170043.

Dammel, Antje, Nowak, Jessica, & Mirjam Schmuck. 2010. Strong-verb paradigm leveling in four Germanic languages: A category frequency approach. Journal of Germanic Linguistics 22(4): 337-359.

Dols, Nicolau. 2020. Phonology, phonetics, intonation. In J. A. Argenter, & J. Lüdtke (eds.), Manual of Catalan Linguistics, 101-128. Berlin: De Gruyter.

Erdmann, Alexander, Elsner, Micha, Wu, Shijie, Cotterell, Ryan, & Nizar Habash. 2020. The paradigm discovery problem. https://doi.org/10.48550/arXiv.2005.01630

Fabra, Pompeu. 1932. Diccionari general de la llengua catalana. Barcelona: Llibreria Catalònia.

Fabra, Pompeu. 1937. La conjugació dels verbs en català. Barcelona: Barcino.

Garrido, Juan M., Codina, Marta, & Kimber Fodge. 2018. TransDic, a public domain tool for the generation of phonetic dictionaries in standard and dialectal Spanish and Catalan. In IberSPEECH, 291–295.

Guerrero, Aurélie. 2014. Analyse thématique de la flexion en catalan central standard. PhD dissertation, Université Toulouse le Mirail - Toulouse II.

Herce, Borja. 2016. Why frequency and morphological irregularity are not independent variables in Spanish: A response to Fratini et al. (2014). Corpus Linguistics and Linguistic Theory 12(2): 389-406.

Herce, Borja. 2019. Morphome interactions. Morphology 29(1): 109-132.

Herce, Borja. 2022. Stress and stem allomorphy in the Romance perfectum: emergence, typology, and motivations of a symbiotic relation. Linguistics 60(4): 1103-1147.

Herce, Borja. 2023. VeLeSpa: An inflected verbal lexicon of Peninsular Spanish and a quantitative analysis of paradigmatic predictability. Research Square.

Herce, Borja, & Bogdan Pricop. 2024. VeLeRo: an inflected verbal lexicon of standard Romanian and a quantitative analysis of morphological predictability. Language Resources and Evaluation: 1-17.

Herrick, Dylan. 1999. Catalan cluster simplification and nasal place assimilation. Phonology at Santa Cruz 6: 25-37.

Herrick, Dylan. 2003. An acoustic analysis of phonological vowel reduction in six varieties of Catalan. Ph.D. dissertation, University of California, Santa Cruz.

Holvoet, Axel. 2023. Towards an enhanced semantic map for imperatives. STUF-Language Typology and Universals 76(4): 635-657.

Hualde, Jose I., & Jennifer Zhang. 2022. Intervocalic lenition, contrastiveness, and neutralization in Catalan. Isogloss. Open Journal of Romance Linguistics 8(4): 1-20.

Jary, Mark, & Mikhail Kissine. 2016. When terminology matters: The imperative as a comparative concept. Linguistics 54(1): 119-148.

Juge, Matthew. 2006. Morphological factors in the grammaticalization of the Catalan “go” past. Diachronica 23(2): 313-339.

Kusters, Wouter. 2003. Linguistic complexity. Utrecht: Netherlands Graduate School of Linguistics.

Lamuela, Xavier. 2020. Spelling. In J. A. Argenter, & J. Lüdtke (eds.), Manual of Catalan Linguistics, 81-100. Berlin: De Gruyter.

Lieberman, Erez, Michel, Jean-Baptiste, Jackson, Joe, Tang, Tina, & Martin A. Nowak. 2007. Quantifying the evolutionary dynamics of language. Nature 449(7163): 713-716.

List, Johann-Mattis. (Forthcoming). Modelling sound change with ordered layers of simultaneous sound laws. https://doi.org/10.17613/4n5z-9y52

Lloret, María R., & Pilar Prieto. 2022. Catalan. In C. Gabriel, G. Randall, & T. Meisenburg (eds.), Manual of Romance Phonetics and Phonology 27, 743-778. Berlin: De Gruyter.

Maiden, Martin. 2001. A strange affinity: ‘perfecto y tiempos afines’. Bulletin of Hispanic Studies 78(4): 441-464.

Maiden, Martin. 2018. The Romance verb: Morphomic structure and diachrony. Oxford: Oxford University Press.

Mańczak, Witold. 1966. La nature du supplétivisme. Linguistics 4: 82–89.

Marr, Clayton, & David Mortensen. 2023. Large-scale computerized forward reconstruction yields new perspectives in French diachronic phonology. Diachronica 40(2): 238-285.

Mascaró, Joan. 1991. Iberian spirantization and continuant spreading. Catalan Working Papers in Linguistics 1: 167-179.

Montermini, Fabio, & Olivier Bonami. 2013. Stem spaces and predictability in verbal inflection. Lingue e linguaggio 12(2): 171-190.

Milizia, Paolo. 2019. Diachrony and morphological equilibrium: The case of the southern New Indo-Aryan verb. In M. Cennamo, G. Giusti, B. Sevdali, & M. Taine-Cheikh (eds.), Historical Linguistics 2015: Selected papers from the 22nd International Conference on Historical Linguistics, 150-169. Amsterdam: John Benjamins.

Nogués-Graell, Jordina. 2019. Vowel Reduction in Catalan Varieties: Catalan Typologies and Property Analysis. Master’s thesis, UiT Norges arktiske universitet.

Oltra-Massuet, Maria I. 1999. On the notion of theme vowel: A new approach to Catalan verbal morphology. Ph.D. dissertation, Massachusetts Institute of Technology.

Pellegrini, Matteo, & Alessandra T. Cignarella. 2020. (Stem and Word) predictability in Italian verb paradigms: An entropy-based study exploiting the new resource LeFFI. In Proceedings of the 7th Italian Conference on Computational Linguistics (CLiC-it 2020): 1-6. CEUR.

Perea, María-Pilar, & Hiroto Ueda. 2010. Applying quantitative analysis techniques to La flexió verbal en els dialectes catalans. Dialectologia et Geolinguistica 18: 99–114.

Saldana, Carmen, Herce, Borja, & Balthasar Bickel. 2022. More or less unnatural: Semantic similarity shapes the learnability and cross-linguistic distribution of unnatural syncretism in morphological paradigms. Open Mind 6: 183-210.

Shannon, Claude E. 1948. A mathematical theory of communication. The Bell System Technical Journal 27(3): 379-423.

Sims-Williams, Patrick. 2018. Mechanising historical phonology. Transactions of the Philological Society 116(3): 555-573.

Stump, Gregory & Raphael A. Finkel. 2013. Morphological typology: From word to paradigm. Cambridge: Cambridge University Press.

Sylak-Glassman, John. 2016. The composition and use of the universal morphological feature schema (unimorph schema). Johns Hopkins University.

Trudgill, Peter. 2011. Sociolinguistic typology: Social determinants of linguistic complexity. Oxford: Oxford University Press, USA.

Vila, F. Xavier. 2020. Language demography. In J. A. Argenter, & J. Lüdtke (eds.), Manual of Catalan Linguistics, 629-648. Berlin: De Gruyter.

Wheeler, Max, Yates, Alan, & Nicolau Dols. 2002. Catalan: A comprehensive grammar. London: Routledge.

Wheeler, Max. 2005. The phonology of Catalan. Oxford: Oxford University Press.

Wheeler, Max. 2011. The evolution of a morphome in Catalan verb inflection. In M. Maiden, J. C. Smith, M. Goldbach, & M-O. Hinzelin (eds.), Morphological autonomy: Perspectives from Romance inflectional morphology, 183-209. Oxford & New York: Oxford University Press.

Wu, Shijie, Cotterell, Ryan, & Timothy J. O’Donnell. 2019. Morphological irregularity correlates with frequency. https://doi.org/10.48550/arXiv.1906.11483

Ylonen, Tatu. 2022. Wiktextract: Wiktionary as machine-readable structured data. In Proceedings of the International Conference on Language Resources and Evaluation, 1317-1325. European Language Resources Association (ELRA).

Published

2024-12-20

How to Cite

Herce, B., & Pricop, B. (2024). VeLeCa: A verbal lexicon of Catalan with PCFP analysis. Isogloss. Open Journal of Romance Linguistics, 10(1), 1–17. https://doi.org/10.5565/rev/isogloss.457

Downloads