Japanese Word Sketches: Advances and Problems
DOI:
https://doi.org/10.4312/ala.1.2.63-82Keywords:
word sketches, Japanese collocations, evaluation, corpus, language technologiesAbstract
In this paper, we present results of an evaluation of Japanese word sketches and address in detail issues that were observed by the evaluators. A word sketch presents a list of salient collocates of a word, organized by the grammatical relations holding between the word and its collocate. The word sketch functionality is incorporated into the Sketch Engine corpus query system and has been created for more than twenty languages so far, including Japanese. The issues that have been discovered in the evaluation of word sketches in Japanese are to be addressed for further enhancement of the word sketch functionality. Other tools and resources which are combined for use and influence the performance of the word sketches should also be looked over. We divide the issues into the following: 1) the lemmatizer and tagger in use, 2) the sketch grammar that is specifically written for Japanese, and 3) the corpus and statistical methods.
Downloads
References
Kilgarriff, A. & Rundell, M. (2002). Lexical Profiling Software and its Lexicographic Applications - a Case Study. EURALEX 2002 Proceedings. 807-818.
Kilgarriff, A., Kovář, V., Krek, S., Srdanović, I., Tiberius, C. (2010). A Quantitative Evaluation of Word Sketches. Proceedings of the XIV Euralex International Congress. Leeuwarden : Fryske Academy. 7pp. (available at http://nlp.fi.muni.cz/publications/kilgarriff_xkovar3_etal/kilgarriff_xkovar3_etal.pdf)
Maekawa, K., Yamazaki, M., Maruyama, T., Yamaguchi, M., Ogura, H., Kashino, W., Ogiso, T., Koiso, H., Den, Y. (2010). Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese. Proceedings of LREC 2010, Malta. 1483-1486.
Oxford Collocations Dictionary for Students of English (OCD). (2009). Oxford University Press
Rundell, M, ed. (2002). Macmillan English Dictionary for Advanced Learners. London: Macmillan.
Seeley, C. (1991). A History of Writing in Japan. University of Hawai'i Press, Honolulu. 243pp.
Srdanović, E. I., Erjavec T. & Kilgarriff, A. (2008a). A web corpus and word-sketches for Japanese. Sizen gengo syori (Journal of Natural Language Processing) 15/2. 137-159. (also available at http://www.jstage.jst.go.jp/article/imt/3/3/3_529/_article)
Srdanović, I, Bekeš, A., Nishina, K. (2008b). Distant collocations of adverbs and modality forms observed in various Japanese language corpora. Tokutei ryooiki kenkyuu 'Nihongo koopasu', Tokyo: Monbukagakusyoo kagakukenkyuuhi tokuteiryooiki kenkyuu 'Nihongo koopasu' Sookatu ban (Workshop of the Priority Area Research “Japanese corpus”), Tokio. 223-230.
Srdanović, E.I., Nishina, K. (2008). Koopasu kensaku tuuru Sketch Engine no nihongoban to sono riyoo hoohoo (The Sketch Engine corpus query tool for Japanese and its possible applications), Nihongo kagaku (Japanese Linguistics) 23. 59-80.
Vance, T. J. (1991). Instant vocabulary through prefixes and suffixes. Power Japanese series. Kodansha International. 128pp.
Downloads
Published
Issue
Section
License
Copyright (c) 2011 Irena SRDANOVIĆ, Naomi IDA, Chikako SHIGEMORI BUČAR, Adam KILGARRIFF, Vojtěch KOVÁŘ
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.