Kodiranje večbesednih leksikalnih enot s TEI Lex-o: študija primera

Avtorji

  • Toma Tasovac Beograjski center za digitalno humanistiko, Srbija
  • Ana Salgado Nova univerza v Lizboni, CLUNL, Portugalska
  • Rute Costa Nova univerza v Lizboni, CLUNL, Portugalska

DOI:

https://doi.org/10.4312/slo2.0.2020.2.28-57

Ključne besede:

TEI, leksikografija, jezikovni viri, večbesedne leksikalne enote, interoperabilnost

Povzetek

Modeliranje in kodiranje večbesednih leksikalnih enot oz. pogostih nizov leksemov, ki jih obravnavamo kot samostojne leksikalne enote, je tematika, ki v smernicah Text Encoding Initiative (TEI) ni ustrezno in dovolj poglobljeno predstavljena, čeprav je TEI v raziskovalni skupnosti de facto standard pri delu z elektronskimi besedili. V prispevku na primeru Slovarja Portugalske akademije znanosti predstavimo nekatere rešitve pri kodiranju večbesednih leksikalnih enot v formatu TEI Lex-o, iniciative, katere namen je poenostaviti in racionalizirati kodiranje leksikalnih podatkov s TEI in posledično izboljšati interoperabilnost. Vpeljemo pojem makro- in mikrostrukturne relevantnosti z namenom razločevati med večbesednimi leksikalnimi enotami, ki so samostojne slovarske iztočnice, in tistimi, ki se nahajajo v geslih enobesednih iztočnic. Vpeljemo tudi pojem leksikografske transparentnosti za razlikovanje med enotami, ki nimajo razlage, in tistimi, ki jo imajo; prve so kodirane v okviru elementa <form>, slednje pa v okviru elementa <entry> in lahko vsebujejo nadaljnje omejitve (številke pomenov, področne oznake, slovnične oznake ipd.). V elementu <gram> vpeljemo uporabo atributov za kodiranje različnih tipov oznak za večbesedne leksikalne enote (implicitne, eksplicitne in normirane). Prispevek zaključimo s sklepom, da bi se interoperabilnost leksikalnih virov močno izboljšala, če bi avtorji slovarskih shem imeli dostop do bogate, a relativno enostavne tipologije večbesednih leksikalnih enot.

Prenosi

Podatki o prenosih še niso na voljo.

Literatura

Dicionário da Língua Portuguesa Contemporânea. (2001). João Malaca Casteleiro (Eds.), 2 vols. Lisboa: Academia das Ciências de Lisboa and Editorial Verbo.

Dictionnaire des Expressions et Locutions. (1993). Alain Rey and Sophie Chantreau (Eds.). Col. Les Usuels. Paris: Éd. Dictionnaires Le Robert.

Grande Dicionário Houaiss da Língua Portuguesa. (2015). Instituto António Houaiss Bloco Gráfico, Lda. Lisboa: Círculo de Leitores.

DARIAH WG = Lexical Resources and the H2020-funded European Lexicographic Infrastructure (ELEXIS). Retrieved from https://github.com/DARIAHERIC/lexicalresources/tree/master/Schemas/TEILex0 (23. 2. 2020)

TEI Consortium (Ed.) = TEI P5: Guidelines for Electronic Text Encoding and Interchange (2019). Version 3.5.0. [Last updated on 29th January 2019, revision 3c0c64ec4.] TEI Consortium. Retrieved from http://www.tei-c.org/Guidelines/P5/ (23. 2. 2020)

Atkins, B. T. S., & Rundell, M. (2008). The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press.

Baldwin, T., & Kim, S. (2010): Multiword Expressions. In N. Indurkhya & F. J. Damerau (Eds.), Handbook of Natural Language Processing (2nd ed., pp. 267–292). Boca Raton, USA, CRC Press.

Bergenholtz, H., & Gouws, R. (2013). A Lexicographical Perspective on the Classification of Multiword Combinations. International Journal of Lexicography, 27(1), 1–24. doi: 10.1093/ijl/ect031 DOI: https://doi.org/10.1093/ijl/ect031

Calzolari, N., Fillmore, C. J., Grishman, R., Ide, N., Lenci, A., MacLeod, C., & Zampolli, A. (2002). Towards Best Practice for Multiword Expressions in Computational Lexicons. In Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2002) (pp. 1934–1940). Spain: Las Palmas, Canary Islands.

Considine, J. (2014). Academy Dictionaries 1600-1800. Cambridge, New York: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9781107741997

Cowie, A. P. (1994). Phraseology. In R. E. Asher (Ed.), The Encyclopedia of Language and Linguistics (pp. 3168-3171). Oxford, UK: Pergamon.

Cowie, A. P. (Ed.). (1998). Theory, Analysis, and Applications. Oxford: OUP.

Fellbaum, C. (2016). Treatment of Multi-Word Units. In P. Durkin (Ed.), The Oxford Handbook of Lexicography (pp. 411–424). Oxford: Oxford University Press. DOI: https://doi.org/10.1093/oxfordhb/9780199691630.013.31

Fontenelle, T. (1997). Turning a Bilingual Dictionary into a Lexical-Semantic Database. Tübingen: Niemeyer. DOI: https://doi.org/10.1515/9783110920116

Gantar, P., Colman, L., Parra Escartín, C., & Martínez Alonso, H. (2018). Multiword Expressions: Between Lexicography and NLP. International Journal of Lexicography, 32(2), 138–162. doi: 10.1093/ijl/ecy012 DOI: https://doi.org/10.1093/ijl/ecy012

Hausmann, F. J. (1979). Un Dictionnaire des Collocations Est-Il Possible? Travaux de Linguistique et de Littérature, 17(1), 187–195.

ISO 24613-1 (2019). Language Resource Management — Lexical Markup Framework (LMF) — Part 1: Core Model. Genève: Organisation Internationale de Normalisation.

Jónsson, J. H. (2009). Lemmatisation of Multiword Lexical Units: Motivation and Benefits. In H. Bergenholtz, S. Nielsen & S. Tarp (Eds.), Lexicography at a Crossroads. Dictionaries and Encyclopedias Today, Lexicographical Tools Tomorrow (pp. 165–194). Bern: Peter Lang AG.

Kinable, D. (2015). Reflections on the Concept of a Scholarly Dictionary. Kernerman Dictionary News, 23, 11–2.

Lorentzen, H. (1996). Lemmatization of Multi-word Lexical Units: In Which Entry? In M. Gellerstram et al. (Eds.), Proceedings of the 7th EURALEX International Congress on Lexicography: Part I (pp. 415–421). Goteborg, Sweden: Goteborg University Department of Swedish.

McCrae, J. P., Tiberius, C., Khan, F., Kernerman, A., Declerck, T., Krek, S., Monachini, M., & Ahmadi, S. (2019). The ELEXIS interface for interoperable lexical resources. In I. Kosem, T. Zingano Kuhn, M. Correia, J. P. Ferreira, M. Jansen, I. Pereira, J. Kallas, M. Jakubíček, S. Krek & C. Tiberius (Eds.), Electronic Lexicography in the 21st Century: Smart Lexicography. Proceedings of the eLex 2019 Conference (pp. 417–433). Brno: Lexical Computing CZ, s.r.o. Retrieved from https://elex.link/elex2019/wp-content/uploads/2019/09/eLex_2019_37.pdf

Mel’čuk, I., Arbatchewsky-Jumarie, N., Iordanskaja, L., Mantha, S., & Polguère, A. (1984–1999). Dictionnaire Explicatif et Combinatoire du Français Contemporain. Recherches lexico-sémantiques, IV. Montréal: Les Presses de l’Université de Montréal. DOI: https://doi.org/10.2307/j.ctv69t5n2

Mel’čuk, I. (1998). Collocations and Lexical Functions. In A. P. Cowie (Ed.), Phraseology, Theory, Analysis, and Applications (pp. 23–54). Oxford: Oxford University Press.

Moon, R. (1998). Fixed Expressions and Idioms in English: A Corpus-Based Approach. Oxford: Clarendon Press.

Romary, L., & Tasovac, T. (2018). TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources. In Proceedings of the 8th Conference of Japanese Association for Digital Humanities (pp. 274–275). Retrieved from https://tei2018.dhii.asia/AbstractsBook_TEI_0907.pdf

Sailer, M., & Markantonatou, S. (2018). Multiword expressions: Insights from a multilingual perspective (Phraseology and Multiword Expressions): Vol. 1. Berlin: Language Science Press. doi: 10.5281/zenodo.1182583

Salgado, A., Costa, R., Tasovac, T., & Simões, A. (2019a). Improving the Consistency of Usage Labelling in Dictionaries with TEI Lex-0. Lexicography: Journal of ASIALEX 6(2), 133–156. doi: 10.1007/s40607-019-00061-x DOI: https://doi.org/10.1007/s40607-019-00061-x

Salgado, A., Costa, R., & Tasovac, T. (2019b). TEI Lex-0 In Action: Improving the Encoding of the Dictionary of the Academia das Ciências de Lisboa. In I. Kosem, T. Zingano Kuhn, M. Correia, J. P. Ferreira, M. Jansen, I. Pereira, J. Kallas, M. Jakubíček, S. Krek & C. Tiberius (Eds.), Electronic Lexicography in the 21st Century: Smart Lexicography. Proceedings of the eLex 2019 Conference, 1–3 October, 2019, Sintra, Portugal (pp. 417–433). Brno: Lexical Computing CZ, s.r.o. Retrieved from https://elex.link/elex2019/wp-content/uploads/2019/09/eLex_2019_23.pdf

Simões, A., Almeida, J. J., & Salgado, A. (2016). Building a Dictionary using XML Technology. In Open Access Series in Informatics (OASIcs). 5th Symposium on Languages, Applications and Technologies (SLATE'16): Vol. 51 (pp. 14:1–14:8). Germany, Dagstuhl: Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.

Svensén, B. (2009). A Handbook of Lexicography: The Theory and Practice of Dictionary Making. Cambridge: Cambridge University Press.

Tasovac, T., & Petrović, S. (2015). Multiple Access Paths for Digital Collections of Lexicographic Paper Slips. In I. Kosem, M. Jakubíček, J. Kallas & S. Krek (Eds.), Electronic Lexicography in the 21st Century: Linking Lexical Data in the Digital Age. Proceedings of the eLex 2015 Conference (pp. 384–396). Ljubljana/Brighton: Institute for Applied Slovene Studies and Lexical Computing Ltd. Retrieved from https://elex.link/elex2015/proceedings/eLex_2015_25_Tasovac+Petrovic.pdf

Zgusta, L. (1971). Manual of Lexicography. Prague: Academia; The Hague/Paris: Mouton. DOI: https://doi.org/10.1515/9783111349183

Objavljeno

10. 08. 2020

Kako citirati

Tasovac, T., Salgado, A., & Costa, R. (2020). Kodiranje večbesednih leksikalnih enot s TEI Lex-o: študija primera. Slovenščina 2.0: Empirične, Aplikativne in Interdisciplinarne Raziskave, 8(2), 28-57. https://doi.org/10.4312/slo2.0.2020.2.28-57