Ocene posameznih leksikalnih elementov, pridobljene z množičenjem
Perspektiva osrednjega besedišča
DOI:
https://doi.org/10.4312/slo2.0.2022.2.5-61Ključne besede:
osrednje besedišče in učenje jezika, množičenje pri nestrokovnjakih, posamični leksikalni elementi, ravni CEFR, primerjalna presojaPovzetek
V raziskavi preučujemo teoretična in praktična vprašanja, povezana z razlikovanjem med osrednjim in obrobnim besediščem na različnih ravneh jezikovnega znanja z uporabo statističnih pristopov v kombinaciji z množičenjem. Obenem ugotavljamo, ali je mogoče razvrstitve oseb, ki se učijo drugega jezika, uporabiti za določanje ravni nepoznanega besedišča. Raziskava je izvedena na enobesednih enotah v švedščini.
Preučujemo štiri hipoteze: (1) za vsako raven znanja obstaja osrednje besedišče, vendar to velja le do ravni B2 po CEFR (višja srednja raven); (2) osrednje besedišče kaže večjo sistematičnost v rabi, medtem ko se robni elementi obnašajo bolj idiosinkratično; (3) glede na to, da imamo za vsako raven na voljo ključne elemente (t. i. sidrne elemente), lahko vsako novo nepoznano besedo postavimo ob bok omenjenim ključnim elementom z vrsto primerjalnih ocenjevalnih nalog in tako določimo “ciljno” raven za prej nepoznano besedo; in (4) osebe s pomanjkljivim znanjem se bodo v primerjalnem ocenjevanju odrezale enakovredno osebam z dobrim znanjem. Hipoteze smo v veliki meri potrdili: V povezavi z (1) in (2) naši rezultati kažejo, da obstaja določena sistematičnost pri jedrnem besedišču za začetne in srednje ravni (A1-B1), medtem ko smo pri višjih ravneh (B2-C1) opazili manj sistematičnosti. Pri točki (3) predlagamo, da se kot metoda za dodelitev “ciljne” ravni nepoznanim besedam uporabi množičenje ocen besed z uporabo primerjalne presoje in s pomočjo poznanih sidrnih besed. Glede (4) potrjujemo predhodne ugotovitve, da je mogoče za naloge jezikovnega označevanja v okviru primerjalne presoje učinkovito uporabiti nestrokovnjake, v našem primeru učence jezika.
Prenosi
Literatura
Alfter, D. (2021). Exploring natural language processing for single-word and multiword lexical complexity from a second language learner perspective. PhD thesis. University of Gothenburg.
Alfter, D., Bizzoni, Y., Agebjörn, A., Volodina, E., & Pilán, I. (2016). From distributions to labels: A lexical proficiency analysis using learner corpora. In Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition (pp. 1–7).
Alfter, D., Cardon., R., & François, T. (2022). A Dictionary-based Study of Word Sense Difficulty. In Proceedings of the 2nd Workshop on Tools and Resources for People with REAding DIfficulties (READI), (pp. 17–24).
Alfter, D., & Volodina, E. (2018). Towards single word lexical complexity prediction. In Proceedings of the thirteenth workshop on innovative use of NLP for building educational applications (pp. 79–88). DOI: https://doi.org/10.18653/v1/W18-0508
Alfter, D., Lindström Tiedemann, T., & Volodina, E. (2021). Crowdsourcing Relative Rankings of Multi-Word Expressions: Experts versus Non-Experts. Northern European Journal of Language Technology (Vol. 1). doi: 10.3384/nejlt.2000-1533.2021.3128 DOI: https://doi.org/10.3384/nejlt.2000-1533.2021.3128
Alfter, D., Borin, L., Pilán, I., Lindström Tiedemann, T., & Volodina, E. (2019). Lärka: from language learning platform to infrastructure for research on language learning. In Selected papers from the CLARIN Annual Conference 2018, 8–10 October 2018, Pisa (pp. 1–14). Linköping University Electronic Press.
Bell, H. (2013). Core Vocabulary. In C. Chapelle (Ed.) The encyclopedia of applied linguistics. Malden, MA: Wiley-Blackwell. DOI: https://doi.org/10.1002/9781405198431.wbeal0223
Borin, L. (2012). Core vocabulary: A useful but mystical concept in some kinds of linguistics. In Shall We Play the Festschrift Game? (pp. 53–65). Springer, Berlin, Heidelberg. DOI: https://doi.org/10.1007/978-3-642-30773-7_6
Borin, L., Forsberg, M., & Roxendal, J. (2012). Korp – the corpus infrastructure of Språkbanken. In: Proceedings of LREC 2012 (pp. 474–478). Istanbul: ELRA.
Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the new general service list. Applied Linguistics, 36(1), 1–22. DOI: https://doi.org/10.1093/applin/amt018
Capel, A. (2015). The English Vocabulary Profile. English profile in practice, 5, 9–27.
Carter, R. (1982). A note on core vocabulary. In Stubbs M. and Carter R. (Eds.), Nottingham Linguistic Circular, 11(2), 39–51.
Carter, R. (1987). Is there a core vocabulary? Some implications for language teaching. Applied linguistics, 8(2), 178–193. DOI: https://doi.org/10.1093/applin/8.2.178
Council of Europe [COE]. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Press Syndicate of the University of Cambridge.
Council of Europe [COE]. (2018). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Companion Volume with New Descriptors. Retrieved from https://rm.coe.int/cefr-companion-volume-with-new-descriptors-2018/1680787989 (19. 10. 2021)
Crosbie, S., Pine, C., Holm, A., & Dodd, B. (2006). Treating Jarrod: A core vocabulary approach. Advances in Speech Language Pathology, 8(3), 316–321. DOI: https://doi.org/10.1080/14417040600750172
De Clercq, Orphée, Hoste, V., Desmet, B., Van Oosten, P., De Cock, M., & Macken, L. (2014). Using the crowd for readability prediction. Natural Language Engineering, 20(3), 293–325. DOI: https://doi.org/10.1017/S1351324912000344
Dixon, Robert MW. (1971). A method of semantic description. Semantics: An interdisciplinary reader in philosophy, linguistics and psychology, 436–471.
Familiar, L. (2021). A frequency dictionary of contemporary Arabic fiction: core vocabulary for learners and material developers. Routledge. DOI: https://doi.org/10.4324/9780429490842
Fix, E., & Lawson Hodges, J. (1989). Discriminatory analysis. Nonparametric discrimination: Consistency properties. International Statistical Review/Revue Internationale de Statistique, 57(3), 238–247. DOI: https://doi.org/10.2307/1403797
Fort, K. (2016). Collaborative Annotation for Reliable Natural Language Processing: Technical and Sociological Aspects. John Wiley & Sons. DOI: https://doi.org/10.1002/9781119306696
François, T., Volodina, E., Pilán, I., & Tack, A. (2016). SVALex: a CEFR-graded lexical resource for Swedish foreign and second language learners. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 213–219).
Granger, S., & Larsson, T. (2021). Is core vocabulary a friend or foe of academic writing? Single-word vs multi-word uses of THING. Journal of English for Academic Purposes, 52, 100999. DOI: https://doi.org/10.1016/j.jeap.2021.100999
Hawkins, J. A., & Filipović, L. (2012). Criterial Features in L2 English. In English Profile Studies 1. Cambridge: Cambridge University Press.
Holmer, D., & Rennes, E. (2022). NyLLex: A Novel Resource of Swedish Words Annotated with Reading Proficiency Level. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 20 – 25 June 2022, Marseille (pp. 1326–1331). Retrieved from http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.141.pdf
Hulstijn, J. H. (2019). An individual differences framework for comparing nonnative with native speakers: Perspectives from BLC theory. Language Learning, 69, 157–183. DOI: https://doi.org/10.1111/lang.12317
Kilgarriff, A., Charalabopoulou, F., Gavrilidou, M., Bondi Johannessen, J., Khalil, S., Johansson Kokkinakis, S., Lew, R., Sharoff, S., Vadlapudi, R., & Volodina, E. (2014). Corpus-based vocabulary lists for language learners for nine languages. Language resources and evaluation, 48(1), 121–163. DOI: https://doi.org/10.1007/s10579-013-9251-2
Kosem, I., Krek, S., Gantar, P., Arhar Holdt, Š., Čibej, J., & Laskowski, C. (2018). Collocations dictionary of modern Slovene. In Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts (pp. 989–997). Ljubljana: Znanstvena založba Filozofske fakultete Univerze v Ljubljani. Retrieved from https://ebooks.uni-lj.si/ZalozbaUL/catalog/view/118/211/2939
Kullenberg, C., & Kasperowski, D. (2016). What is citizen science? – a scientometric meta-analysis. PloS one, 11(1), e0147152. DOI: https://doi.org/10.1371/journal.pone.0147152
Lau, J. H., Clark, A., & Lappin, S. (2014). Measuring gradience in speakers’ grammaticality judgements. In Proceedings of the annual meeting of the cognitive science society, 36(36).
Lehmann, H. (1991). Towards a core vocabulary for a natural language system. In Fifth Conference of the European Chapter of the Association for Computational Linguistics. Retrieved from https://aclanthology.org/E91-1053.pdf DOI: https://doi.org/10.3115/977180.977233
Lindström Tiedemann, T., Alfter, D., & Volodina, E. (2022). CEFR-nivåer och svenska flerordsuttryck [= CEFR levels and Swedish Multiword Expressions]. In S. Björklund, B. Haagensen, M. Nordman & A. Westerlund (Eds.), Svenskan i Finland 19, Vaasa:: Svensk-Österbottniska Samfundet r.f. (pp. 218–233). Retrieved from https://www.doria.fi/handle/10024/185549
Lonsdale, D., & Le Bras, Y. (2009). A frequency dictionary of French: Core vocabulary for learners. Routledge. DOI: https://doi.org/10.4324/9780203883044
Louviere, J. J., N. Flynn, T., & Marley, A. A. J. (2015). Best-worst scaling: Theory, methods and applications. Cambridge University Press. DOI: https://doi.org/10.1017/CBO9781107337855
MacQueen, J. (1967). Classification and analysis of multivariate observations. 5th Berkeley Symp. Math. Statist. Probability.
Márquez, M. F. (2007). Renewal of core English vocabulary: A study based on the BNC. English Studies, 88(6), 699–723. DOI: https://doi.org/10.1080/00138380701706385
Mühlenbock, K., H., & Johansson Kokkinakis, S. (2012). SweVoc-a Swedish vocabulary resource for CALL. In Proceedings of the SLTC 2012 workshop on NLP for CALL, 25th October 2012, Lund (pp. 28–34). Linköping University Electronic Press.
Ortega, L. (2012). Interlanguage complexity. In Kortmann, B. & B. Szmrecsanyi (Eds.), Linguistic complexity: Second language acquisition, indigenization, contact (pp. 127–155). De Gruyter. DOI: https://doi.org/10.1515/9783110229226.127
Paquot, M., Rubin, R., & Vandeweerd, N. (2022). Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating. Language Learning. DOI: https://doi.org/10.1111/lang.12498
Scott, William A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3), 321–325. doi: 10.1086/266577 DOI: https://doi.org/10.1086/266577
Stein, G. (2017). Some thoughts on the issue of core vocabularies: A response to Vaclav Brezina and Dana Gablasova: Is there a core general vocabulary? Introducing the new general service list. Applied Linguistics, 38(5), 759–763. DOI: https://doi.org/10.1093/applin/amw027
Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell publishers.
Swadesh, M. (2017). The origin and diversification of language. Routledge. DOI: https://doi.org/10.4324/9781315133621
Volodina, E., & Johansson Kokkinakis, S. (2012a). Introducing the Swedish Kelly-list, a new lexical e-resource for Swedish. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 1040–1046).
Volodina, E., & Johansson Kokkinakis, S. (2012b). Swedish Kelly: Technical Report. GU-ISS-2012-01. The Swedish Language Bank, Gothenburg University.
Volodina, E., Pilán, I., Llozhi, L., Degryse, B., & François, T. (2016). SweLLex: second language learners’ productive vocabulary. In Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition (pp. 76–84).
Volodina, E., Pilán, I., Rødven Eide, S., & Heidarsson, H. (2014). You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language. In Proceedings of the third workshop on NLP for computer-assisted language learning (pp. 128–144).
Wang, G., Li, C., Wang, W., Zhang, Y., Shen, D., Zhang, X., Henao, R., Carin, L. (2018). Joint Embedding of Words and Labels for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers) (pp. 2321–2331). DOI: https://doi.org/10.18653/v1/P18-1216
West, M. (1953). A general service list of English words: with semantic frequencies and a supplementary word-list for the writing of popular science and technology. Longman.
Prenosi
Objavljeno
Številka
Rubrika
Licenca
Avtorske pravice (c) 2022 Elena Volodina, David Alfter, Therese Lindström Tiedemann

To delo je licencirano pod Creative Commons Priznanje avtorstva-Deljenje pod enakimi pogoji 4.0 mednarodno licenco.