Japanese Word Sketches: Advances and Problems


  • Irena SRDANOVIĆ University of Ljubljana
  • Naomi IDA Meiji University
  • Chikako SHIGEMORI BUČAR University of Ljubljana
  • Adam KILGARRIFF Lexical Computing Ltd.
  • Vojtěch KOVÁŘ Masaryk University




word sketches, Japanese collocations, evaluation, corpus, language technologies


In this paper, we present results of an evaluation of Japanese word sketches and address in detail issues that were observed by the evaluators. A word sketch presents a list of salient collocates of a word, organized by the grammatical relations holding between the word and its collocate. The word sketch functionality is incorporated into the Sketch Engine corpus query system and has been created for more than twenty languages so far, including Japanese. The issues that have been discovered in the evaluation of word sketches in Japanese are to be addressed for further enhancement of the word sketch functionality. Other tools and resources which are combined for use and influence the performance of the word sketches should also be looked over. We divide the issues into the following: 1) the lemmatizer and tagger in use, 2) the sketch grammar that is specifically written for Japanese, and 3) the corpus and statistical methods.


Download data is not yet available.


Himeno, M. (2004). Nihongo hyoogen katuyoo ziten. Kenkyusha.

Kilgarriff, A. & Rundell, M. (2002). Lexical Profiling Software and its Lexicographic Applications - a Case Study. EURALEX 2002 Proceedings. 807-818.

Kilgarriff, A., Kovář, V., Krek, S., Srdanović, I., Tiberius, C. (2010). A Quantitative Evaluation of Word Sketches. Proceedings of the XIV Euralex International Congress. Leeuwarden : Fryske Academy. 7pp. (available at http://nlp.fi.muni.cz/publications/kilgarriff_xkovar3_etal/kilgarriff_xkovar3_etal.pdf)

Maekawa, K., Yamazaki, M., Maruyama, T., Yamaguchi, M., Ogura, H., Kashino, W., Ogiso, T., Koiso, H., Den, Y. (2010). Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese. Proceedings of LREC 2010, Malta. 1483-1486.

Oxford Collocations Dictionary for Students of English (OCD). (2009). Oxford University Press

Rundell, M, ed. (2002). Macmillan English Dictionary for Advanced Learners. London: Macmillan.

Seeley, C. (1991). A History of Writing in Japan. University of Hawai'i Press, Honolulu. 243pp.

Srdanović, E. I., Erjavec T. & Kilgarriff, A. (2008a). A web corpus and word-sketches for Japanese. Sizen gengo syori (Journal of Natural Language Processing) 15/2. 137-159. (also available at http://www.jstage.jst.go.jp/article/imt/3/3/3_529/_article)

Srdanović, I, Bekeš, A., Nishina, K. (2008b). Distant collocations of adverbs and modality forms observed in various Japanese language corpora. Tokutei ryooiki kenkyuu 'Nihongo koopasu', Tokyo: Monbukagakusyoo kagakukenkyuuhi tokuteiryooiki kenkyuu 'Nihongo koopasu' Sookatu ban (Workshop of the Priority Area Research “Japanese corpus”), Tokio. 223-230.

Srdanović, E.I., Nishina, K. (2008). Koopasu kensaku tuuru Sketch Engine no nihongoban to sono riyoo hoohoo (The Sketch Engine corpus query tool for Japanese and its possible applications), Nihongo kagaku (Japanese Linguistics) 23. 59-80.

Vance, T. J. (1991). Instant vocabulary through prefixes and suffixes. Power Japanese series. Kodansha International. 128pp.



20. 10. 2011



Research articles

How to Cite

SRDANOVIĆ, I., IDA, N., SHIGEMORI BUČAR, C., KILGARRIFF, A., & KOVÁŘ, V. (2011). Japanese Word Sketches: Advances and Problems. Acta Linguistica Asiatica, 1(2), 63-82. https://doi.org/10.4312/ala.1.2.63-82