Izdelava slovnice besednih skic za akademsko portugalščino

Avtorji

  • Tanara Zingano Kuhn
  • Iztok Kosem

DOI:

https://doi.org/10.4312/slo2.0.2016.1.124-161

Ključne besede:

slovnica besednih skic, portugalščina, korpus, slovar, evalvacija

Povzetek

Prispevek predstavlja izdelavo nove slovnice besednih skic, ki je bila izdelana posebej za CoPEP, 40-milijonski korpus besedil iz znanstvenih revij. Korpus je bil označen z označevalnikom Freeling v3, privzetim označevalnikom v orodju Sketch Engine za korpuse portugalščine. Najprej na kratko predstavimo korpus CoPEP, razloge za njegovo izdelavo, podprte s pregledom obstoječih korpusov portugalščine. Sledi pregled in evalvacija obstoječih slovnic besednih skic za portugalščino, katere glavni zaključki so, da so obstoječe slovnice besednih skic zelo pomanjkljive in potrebne številnih popravkov in dopolnitev. Čeprav so nekatere poizvedbe slovničnih relacij ali njihovi deli koristni, pa je bilo, tudi z vidika naših raziskav, veliko smortneje izdelati povsem novo slovnico besednih skic. Osrednji del prispevka je tako posvečen podrobnemu opisu izdelave nove slovnice besednih skic in predstavitvi nekaterih težav, s katerimi smo se srečali. Največje težave pri pripravi poizvedb v slovnici besednih skic so povzročale določene pomanjkljivosti označevalnika, kar smo skušali rešiti z dodajanjem številnih pogojev v poizvedbe. Pri nekaterih težavah, kot je npr. označevanje osebnega zaimka se, smo rešitev iskali zunaj slovnice besednih skic in s pomočjo avtorjev orodja Sketch Engine izboljšali postopek priprave vseh korpusov portugalščine, ki so označeni z označevalnikom Freeling v3. Prispevek zaključimo s povzetkom glavnih ugotovitev, izpostavimo pomen rezultatov za nadaljnje raziskave, navedemo pa tudi nekaj predlogov za nadaljnjo izboljšavo slovnice besednih skic. Pomembna ugotovitev je, da besedne skice, izdelane na podlagi nove slovnice besednih skic, ponujajo natančnejše in precej bogatejše rezultate od tistih, ki jih dobimo z uporabo trenutno privzete slovnice besednih skic za portugalščino v Sketch Enginu. Zaradi bogatosti informacij lahko novo slovnico besednih skic uporabimo tudi za napredne leksikografske namene, kot je npr. avtomatsko luščenje leksikalnih podatkov iz korpusa CoPEP; to metodo bomo namreč uporabili pri izdelavi predlaganega slovarja portugalščine za študente. Čeprav je bila izdelana predvsem za namene akademske portugalščine, je slovnica besednih skic dragocen nov vir za leksikografske in korpusne raziskave portugalskega jezika, saj se jo lahko uporabi na vsakem korpusu, označenem z označevalnikom Freeling v3. Posledično smo omogočili uporabo slovnice besednih skic vsem uporabnikom orodja Sketch Engine.

Prenosi

Podatki o prenosih še niso na voljo.

Literatura

Araneum Portugallicum Maius. Available at: https://the.sketchengine.co.uk/bonito/corpus/first_form?corpname=preloaded/pt_araneum_maius (Accessed on 23 November 2016).

Atkins,S. B.T., and Rundell, M. (2008): The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press.

Benko, V. (2014a): Aranea: Yet Another Family of (Comparable) Web Corpora. In P. Sojka, A. Horák, I. Kopeček, and K. Pala (eds.): Text, Speech and Dialogue. 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings. LNCS 8655: 257–264. Brno: Springer International Publishing Switzerland.

Benko, V. (2014b): Compatible Sketch Grammars for Comparable Corpora. In A. Abel, C. Vettori, and N.  Ralli (eds.): Proceedings of the XVI EURALEX International Congress: The User in Focus: 417–430. Bolzano/Bozen: Institute for Specialised Communication and Multilingualism.

Benko, V.: Araneum Portugallicum Maius, verze 15.05. Ústav Českého národního korpusu FF UK, Praha 2015. Available at: https://kontext.korpus.cz/first_form?corpname=aranea%2Faranport_pt_ar13__b_a# (Accessed on 23 November 2016).

Biber, D., Conrad, S., and Leech, G. (2015) : Longman Student Grammar of Spoken and Written English. Harlow: Pearson Education.

Bick, E. (2000): The Parsing System Palavras, Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Arhus: Aarhus University Press.
Capes. Available at: www.capes.gov.br (Accessed on 4 February 2016)
Capes’ Areas of Knowledge Classification. Available at: http://www.capes.gov.br/avaliacao/instrumentos-de-apoio/tabela-de-areas-do-conhecimento-avaliacao (Accessed on 4 February 2016).

Cegalla, D. P. (2008): Novíssima gramática da língua portuguesa. São Paulo: Ed. Nacional.
CLUL (Centro de Linguística da Universidade de Lisboa). Online Resources. Available at: http://clul.ul.pt/en/resources (Accessed on 20 November 2016)

Corpus Brasileiro. Available at: http://corpusbrasileiro.pucsp.br/cb/Acesso.html (Accessed on 20 November 2016).

Corpus do Português: genre/historical. Available at: www.corpusdoportugues.org/hist-gen/ (Accessed on 20 November 2016).

CRPC - Corpus de Referência do Português Contemporâneo. Available at: http://alfclul.clul.ul.pt/CQPweb/crpcfg16/ (Accessed on 23 November 2016).

Gantar, P., Kosem, I., and Krek, S. (2016): Discovering Automated Lexicography: The Case of the Slovene Lexical Database. International Journal of Lexicography, 29 (2): 200–225.

Généreux, Michel, Iris Hendrickx, and Amália Mendes (2012): Introducing the Reference Corpus of Contemporary Portuguese On-Line. Proceedings of the Eighth International Conference on Language Resources and Evaluation - LREC 2012: 2237-2244. Istambul.

Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., and Suchomel, V. (2013): The TenTen Corpus Family. Proceedings of the 7th International Corpus Linguistics Conference: 125–127. Lancaster.

Kallas, J., Kilgarriff, A., Koppel, K., Kudritski, E., Langemets, M., Michelfeit, J., Tuulik, M., and Viks, Ü. (2015): Automatic generation of the Estonian Collocations Dictionary database. In I. Kosem, M Jakubíček, J. Kallas, and S. Krek (eds.): Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom: 1-20. Ljubljana/Brighton: Trojina, Institute for Applied Slovene Studies/Lexical Computing.

Kilgarriff, A., and Kosem, I. (2012): Corpus tools for lexicographers. In S. Granger, and M. Paquot (eds): Electronic Lexicography: 31–55. Oxford: Oxford University Press.

Kilgarriff, A., Baisa, V., Rychlý, P., and Jakubíček, M. (2015): Longest-commonest Match. In I. Kosem, M Jakubíček, J. Kallas, and S. Krek (eds.): Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom: 397–404. Ljubljana/Brighton: Trojina, Institute for Applied Slovene Studies/Lexical Computing.

Kilgarriff, A., Kovář, V., Krek, S., Srdanovic, I., and Tiberius, C. (2010): A Quantitative Evaluation of Word Sketches. In A. Dykstra, and T. Schoonheim (eds.): Proceedings of the XIV Euralex International Congress: 372–379. Leeuwarden: Fryske Akademy; Afûk.

Kilgarriff, A., Rychlý, P., Smrz, P., and Tugwell, D. (2004): The Sketch Engine. In G. Williams, and S. Vessier (eds.): Proceedings of the 11th EURALEX International Congress: 105–115. Lorient: Université de Bretagne-Sud, Faculté des lettres et des sciences humaines.

Kosem, I., Gantar, P., and Krek, S. (2013): Automation of lexicographic work: an opportunity for both lexicographers and crowdsourcing. In I. Kosem, J. Kallas, P. Gantar, S. Krek, M. Langemets, and M. Tuulik (eds.): Electronic Lexicography in the 21st Century: Thinking Outside the Paper: Proceedings of the eLex 2013 Conference, 17-19 October 2013, Tallinn, Estonia: 32–48. Ljubljana/Tallinn: Trojina, Institute for Applied Slovene Studies/Eesti Keele Instituut.

Kosem, I., Gantar, P., Logar, N. and Krek, S. (2014): Automation of lexicographic work using general and specialized corpora: two case studies. In A. Abel, C. Vettori, and N.  Ralli (eds.): Proceedings of the XVI EURALEX International Congress: The User in Focus: 355–364. Bolzano/Bozen: Institute for Specialised Communication and Multilingualism.

Kuhn, T.Z., and Ferreira, J.P. (2016): Building a corpus of written academic texts in Portuguese. Teaching and Language Corpora Conference (TaLC12). Book of Abstracts: 103. Giessen.
Linguateca. Available at: http://www.linguateca.pt/ (Accessed on 20 November 2016)

Logar, N., and Kosem, I. (2013): TERMIS: a corpus-driven approach to compiling an e-dictionary of terminology. In I. Kosem, J. Kallas, P. Gantar, S. Krek, M. Langemets, and M. Tuulik (eds.): Electronic Lexicography in the 21st Century: Thinking Outside the Paper: Proceedings of the eLex 2013 Conference, 17-19 October 2013, Tallinn, Estonia: 164–178. Ljubljana/Tallinn: Trojina, Institute for Applied Slovene Studies/Eesti Keele Instituut.

Newspapers in Portuguese (CetemPúblico, CetenFolha). Available at: https://the.sketchengine.co.uk/bonito/corpus/first_form?corpname=preloaded/portuguese (Accessed on 28 November 2016).

NILC (Interinstitutional Center for Computational Linguistics). Tool and Resources. Available at: http://www.nilc.icmc.usp.br/nilc/index.php/tools-and-resources (Accessed on 20 November 2016)

Oxford Portuguese Dictionary (2015). S. Lopez, A. Frankenberg-Garcia, and H. Newstead. Oxford: Oxford University Press.

Padró, L., and Stanilovsky, E. (2012): FreeLing 3.0: Towards Wider Multilinguality. Proceedings of the Language Resources and Evaluation Conference (LREC 2012) ELRA: 1–7. Istanbul.
Peixoto, R. M. T. (2015): O Fenômeno (De)Queísta no Corpus do Português Brasileiro Acadêmico. Unpublished Master’s Degree Dissertation. Porto Alegre: PUCRS.

Perini, M. A. (2002): Modern Portuguese: A reference grammar. New Haven: Yale University Press.

Portuguese Web 2011 (ptTenTen11, Palavras parsed). Available at: https://the.sketchengine.co.uk/bonito/corpus/first_form?corpname=preloaded/pttenten11 (Accessed on 6 April 2016).

Portuguese Web 2011 (ptTenTen11, Freeling v3). Available at: https://the.sketchengine.co.uk/bonito/corpus/corp_info?corpname=preloaded/pttenten11_freeling_v3_1 (Accessed on 23 November 2016).

Rundell, M., and Kilgarriff, A. (2011): Automating the creation of dictionaries: where will it all end?. In F. Meunier, S. De Cock, G. Gilquin, and M. Paquot (eds.): A Taste for Corpora: In honour of Sylviane Granger. Amsterdam: John Benjamins.
Scielo Brazil Analytics. Available at: http://analytics.scielo.org/w/publication/article?collection=scl (Accessed on 24 November 2016).

Scielo Brazil. Available at: www.scielo.br (Accessed on 15 February 2016)

Scielo Portugal Analytics. Available at: http://analytics.scielo.org/w/publication/article?collection=prt (Accessed on 24 November 2016).

Scielo Portugal. Available at: www.scielo.mec.pt (Accessed on 1 February 2016)
Scielo. Available at: www.scielo.org (Accessed on 23 November 2016)

Sketch Engine. Available at: https://www.sketchengine.co.uk (Accessed on 20 November 2016)

Objavljeno

5. 02. 2017

Številka

Rubrika

Razprave

Kako citirati

Zingano Kuhn, T., & Kosem, I. (2017). Izdelava slovnice besednih skic za akademsko portugalščino. Slovenščina 2.0: Empirične, Aplikativne in Interdisciplinarne Raziskave, 4(1), 124-161. https://doi.org/10.4312/slo2.0.2016.1.124-161

Najbolj brani prispevki istega avtorja(jev)

1 2 > >>