T HE LEARNER AS LEXICOGRAPHER : USING MONOLINGUAL AND BILINGUAL CORPORA TO DEEPEN VOCABULARY KNOWLEDGE

Learning vocabulary is one of the most challenging tasks faced by learners with a non-kanji background when learning Japanese as a foreign language. However, learners are often not aware of the range of different aspects of word knowledge they need in order to successfully use Japanese. This includes not only the spoken and written form of a word and its meaning, but also morphological, grammatical, collocational, connotative and pragmatic knowledge as well as knowledge of social constraints to be observed. In this article, we present some background data on the use of dictionaries among students of Japanese at the University of Ljubljana, a selection of resources and a series of exercises developed with the following aims: a) to foster greater awareness of the different aspects of Japanese vocabulary, both from a monolingual and a contrastive perspective, b) to learn about tools and methods that can be applied in different contexts of language learning and language use, and c) to develop strategies for learning new vocabulary, reinforcing knowledge about known vocabulary, and effectively using this knowledge in receptive and productive language tasks.


Introduction
Learning vocabulary is one of the most challenging tasks faced by learners with a non-kanji background when learning Japanese as a foreign language.Most learners are well aware of the quantitative dimension of this task from the beginning and soon develop or try to develop strategies to memorise large quantities of words and kanji characters.However, they are often not equally aware of the qualitative dimension of vocabulary and of the range of different aspects of word knowledge they need in order to successfully use Japanese.This includes not only the spoken and written form of a word and its meaning, but also morphological, grammatical, collocational, connotative and pragmatic knowledge as well as knowledge of social constraints to be observed.
In the following sections, we present some background data on the use of dictionaries among students of Japanese at the University of Ljubljana, gleaned from a small-scale questionnaire survey and from informal observations in class.
We then present a selection of resources and a series of exercises developed with the following aims: a) to foster greater awareness of the different aspects of Japanese vocabulary, both from a monolingual and a contrastive perspective, b) to learn about tools and methods that can be applied in different contexts of language learning and language use, and c) to develop strategies for learning new vocabulary, reinforcing knowledge about known vocabulary, and effectively using this knowledge in receptive and productive language tasks.

Background: dictionary use among Japanese language students
In order to investigate how students approach vocabulary learning, in the spring of 2013 we conducted a small-scale questionnaire on the use of reference resources (dictionaries, internet sites, mobile applications etc.) among students of Japanese in their second and third year of study at the University of Ljubljana.We found that students very often do use dictionaries and other resources during their study, but are not particularly selective when choosing which dictionary to use, and mostly look up only the most basic information.While all of the 17 students surveyed reported that they use on-line dictionaries, only one reported having and using a dictionary in book form (Hadamitsky & Spahn 1982, a reference book of very small size) to look up the readings and stroke order of unknown characters.Six students reported having and using portable electronic dictionaries (Sharp and Casio, produced for the Japanese market) to look up Japanese-English or English-Japanese translations and unknown kanji characters.
All students reported they use on-line free dictionaries.The dictionaries and other on-line resources mentioned by students are shown in Table 1.Most of the dictionaries they mentioned are based on the Japanese-English database produced by the Electronic Dictionary Research and Development Group at Monash University (EDRDG 2014), a dictionary with a very large number of lemmas but only basic information for each lemma (part of speech, translation equivalents, some stylistic labels and a crowdsourced database of usage examples).Twelve out of seventeen students used jisho.org,four used tangorin.com,four used the browser add-on Rikaichan, and one used popjisyo.com;four of the students reported using two of these, although they are only different interfaces to the same Japanese-English database.
Six students reported using the machine-translation site translate.google.com to look up words in several directions (Japanese to English, English to Japanese, Slovene to Japanese), while only two students mentioned other tools.Moreover, 10 out of 17 students reported they use dictionary applications on their mobile phone; some noted the name of the application (four mentioned Kanji recognizer, three mentioned JED, two mentioned WWWJDIC (based on EDRDG), and each of the following was mentioned by one student: All 国語辞典, imiwa (based on EDRDG), Lexiqon and Aedict), while some just answered "a dictionary on my mobile phone".
While it is clear that most students probably use freely available resources because they cannot afford expensive electronic dictionaries, it is rather surprising that none of them mentioned freely available Japanese comprehensive reference sites such as weblio, yahoo jisho, goo jisho or Sanseido's Web Dictionary.When asked about these sites in follow-up interviews, most students responded that they did know about some of these sites, but were overwhelmed by the Japanese interface and did not feel comfortable using them.
When asked what they use these dictionaries and tools for, all students reported they look up words (translations) and unknown kanji characters, and eight out of seventeen students reported they look up how words are used in context.
Finally, when asked whether they have any difficulty with these dictionaries and tools, five students reported they do not always find the word they are looking for, three mentioned that translate.google.com is not reliable, three reported they cannot read the Japanese words they find and spend time looking up the readings, two mentioned they are not sure whether the word they find is the right one for what they mean, two reported having problems with their internet connection, and only one mentioned the problem that words and senses (translations) in the on-line dictionaries are not ordered according to frequency.These answers indicate that students tend to use dictionaries and other reference tools to look up translations of single words and character readings, and are mostly concerned with finding translation equivalents, while only some search for contexts of word use, and none apparently look up connotations, stylistic or pragmatic information.
The survey had two limitations.Firstly, the questions were open ended in order not to influence the responces, and it is therefore quite possible that students forgot to mention some of the reference sources and tools they use, or specific information that they look up less frequently.Secondly, the survey did not cover all of our students and is not representative of the whole student population.However, some tendencies were observed that are also reflected in other reports on dictionary use by learners of Japanese (Fukuda & Hiratsuka 2011, Suzuki 2012, Moroz 2013).Overall, students tend to use simple and freely available reference tools, relying on crowd-sourced Japanese-English and English-Japanese bilingual dictionaries with user-friendly interfaces.They are mostly not aware of other, more sophisticated tools and reference material, and even those students who do know about other resources mostly use them only to look up translations and readings.
On the basis of these results and of similar feedback repeatedly obtained during informal observation in class, we concluded that students need more information on other available reference sources and tools, in order to be able to select the most appropriate tool in each learning situation, and that they require coaching on the use of these tools for specific language learning needs.

Introducing students to different reference resources
Considering our students' reference habits, we selected a few resources that could better equip them for their learning needs and developed some exercises to help them master the use of these resources.

Dictionaries
Firstly, since many students are not aware of different freely available resources, we compiled a list of links to on-line dictionaries on the department's e-learning site1 , including a) the Japanese-Slovene dictionary jaSlo 2 , compiled at our department (Hmeljak Sangawa & Erjavec 2012), surprisingly not known to many of the students; b) different interfaces to WWWJDIC mentioned by students themselves, including WWWJDIC 3 itself and its popular interfaces: Denshi Jisho 4 , Tangorin 5 and Popjisyo 6 ; c) Japanese reference sites -dictionary aggregators such as Yahoo!dictionary 7 , goo dictionary 8 , kotobank 9 and Weblio 10 , all of which include dictionaries by major Japanese publishing houses such as Sanseido, Kodansha, Shogakukan and others; d) the crowd-sourced Japanese-English dictionary Eijiro 11 , e) other sites mentioned by students, including Google translate 12 .This list of on-line reference sources is also presented alongside other resources (textbooks and reference books) during orientation meetings held for each Japanese language class at the beginning of the academic year, where students are encouraged to explore the resources and familiarise themselves with them.

Corpora and lexical profiling systems
The second part of the list of on-line resources includes corpora and lexical profiling systems that can be used by intermediate and advanced students to obtain more detailed information about collocational and stylistic aspects of the words they are learning.
In the last few years, quite a number of Japanese corpora and query systems have been developed, beginning with BCCWJ 13 developed at the NINJAL Center for Corpus Development with the concordancers Shonagon 14 and Chunagon 15 (Maekawa et al. 2014); JpWaC, a web corpus deployed within the lexical profiling system Sketch Engine 16 (Srdanović et al. 2008); its derivative JpWaC-L (Hmeljak Sangawa & Erjavec 2012), a corpus for learners of Japanese containing extracts from JpWaC ranked according to the five levels of the Japanese Language Proficiency Test specifications (JF & AIEJ 2004); the Japanese internet corpus and query system 17 developed at the Centre for Translation studies of the University of Leeds (Sharoff 2006); the writing support system Natsume 18 developed at Tokyo Institute of Technology, a lexical profiling system covering multiple monolingual Japanese corpora simultaneously (Hodošček & Nishina 2012); and the lexical profiling system NINJAL-LWP developed by the National Institute for Japanese Language and Linguistics and the Lago Institute of Language, applied to both the BCCWJ 19  (Pardeshi & Akasegawa 2011) and to the Tsukuba web corpus 20  (Imai et al. 2013).
Students are also encouraged to use bilingual concordances.In particular, they are introduced to two parallel concordancers.The first is the Japanese-Slovene parallel corpus jaSlo 21 developed at our department, a corpus of literary, academic and other web-harvested Japanese texts with Slovene translations, amounting to 760,000 Japanese tokens in 132 documents and 530,000 tokens in the corresponding Slovene translations (Hmeljak Sangawa & Erjavec 2012).The second corpus tool is Linguee 22 , a freely available dictionary combined with a search engine that retrieves translated examples from internet-harvested bilingual texts (Calvert 2009).
is to foster awareness of the different aspects of vocabulary knowledge, while encouraging autonomous learning.
We begin with simple exploratory tasks to introduce students to the use of different resources and interfaces.The first task focuses on dictionaries rather than on corpora, since dictionaries are used by all students and are already familiar to them.

Task 1: verifying the source
Since most students rely on dictionary sites or mobile applications based on the crowd-sourced Japanese-English database produced by EDRDG and are sometimes not even aware of the fact that they are looking up data from the same database using different interfaces, we prepared an exercise to raise their awareness about the structure and content of different dictionary sites, asking them to distinguish between the interface and the data source.
Students are briefly introduced to the dictionaries mentioned in 3.1 and then asked to find and compare dictionary entries for the same word in all dictionary sites.
For example, when they search for the word 留守番, they find that WWWJDIC, Denshi Jisho, Tangorin and Popjisyo provide exactly the same English translations, part of speech information and compound entry, and that even the examples (given only in WWWJDIC and Tangorin) are exactly the same, while only the amount of external links and the layout and colour of the entries are different.Further, they find that the first three of these dictionaries also provide automatically generated links to other dictionaries: WWWJDIC includes links to Google search, Google images, Sanseido dictionary, Eijiro on ALC, example sentences from the Tatoeba project, JapanesePod101.com,Japanese WordNet and Japanese Wikipedia (in the case of 留守番 only six of these links are available); Denshi Jisho offers links to Yahoo! dictionary, goo dictionary and Google search; and Tangorin offers links to Yahoo! dictionary, goo dictionary, Eijiro, Weblio, Linguee, Google Translate, Google.com and Google.co.jp.
On the other hand, they also discover that Eijiro offers the largest number of English translations, examples and compounds, which are different from the other dictionaries, while weblio, Yahoo!dictionary, goo dictionary and kotobank provide similar data from dictionaries that have been published also in book form by traditional dictionary publishing houses.
Finally, we point out to the students the [Edit] and [Amend] buttons in WWWJDIC and check their functioning, to make students aware that entries in this dictionary are often user generated content and sometimes need amending or editing.
With this task, students are encouraged to check the origin of the data they find in on-line reference services, to distinguish between reference data and interfaces, and to choose what works best for them considering both the ease of use and the reliability of the site.

Task 2: matching senses to corpus examples
This task is aimed at familiarising students with Japanese language corpora, their interfaces and search methods, and to help them discover the difference between dictionary descriptions or translations, which capture the main senses and uses of a word, and examples of word use in corpora, which sometimes deviate from the prototypical uses found in dictionaries.Students are presented with a monolingual dictionary definition of a word with multiple senses and its translations, and instructed to search for the same word in different corpora.Their task is to select five examples from each corpus they consult, as if they were compiling an entry for a learners' dictionary, trying to select examples that are understandable to learners at their own level, and representative of the sense described.
For example, given the following dictionary definitions and translations for the word 運動 (Shogakukan Digital Daijisen 2011, Shogakukan Progressive Japanese-English Dictionary 2012) , they matched them with examples such as those given in Table 2 During this task students get acquainted with different corpora interfaces, practice skimming through large amounts of text, become aware of frequent compounds or collocations in which the words are used, and notice how some senses appear more frequently than others.

Task 3: comparing dictionary and corpus translations
The next task is carried out using the Japanese-Slovene parallel corpus jaSlo.Students are again given polysemous words and instructed to search all possible translations of these words in bilingual corpus examples, in order to compile a bilingual dictionary entry for the given word.During this task they notice how some words (technical terms etc.) are mostly translated with the same equivalent, how polysemous words may have many different translations (e.g.世話、余裕、無難、etc.),and how some words (人、また、evidential expressions, onomatopoeia etc.) are sometimes not translated at all.They furthermore explore translations (or omissions) for culturally bound terms (e.g.就職活動, お酌 etc.), for modal expressions, adverbs (はず、せっか く、やはり、さすが、よほど), and false friends (イメージ *imidž, ドレス *dres, タレント *talent, ユニーク *unikaten, ホステス *hostesa, サービス *servis).

Task 4: translating into Japanese
The last task is again carried out using the Japanese-Slovene parallel corpus jaSlo, but in the opposite direction, searching for all possible translations (or omissions) of Slovene polysemous words, as if to compile a Slovene-Japanese dictionary entry.Distinguishing between Japanese synonyms is very challenging, and students are encouraged to first determine which word or multiword expression in the Japanese examples corresponds to the given Slovene word, and then look up these Japanese words in other corpora and dictionaries.

Feedback and conclusion
A portion of these exercises was tested in class, and received a mixed response.Overall, students tended to dislike too long exercises and having to browse through long concordance lists.They were frustrated and confused when they had to go through too many steps.The tasks presented above therefore need some refinement and more intermediate tasks, to gradually introduce students to different functions and search methods, with examples that are neither too difficult nor too obvious for the students' level of language competence.Students responded positively to tasks involving adding or amending content on collaborative sites.Given the scarcity of human and financial resources for the creation of Japanese-Slovene lexicographic resources, a language combination with a very limited number of users and an even more limited number of potential bilingual

Figure 1 :
Figure 1: Results for the search string 留守番 in WWJDIC.

Figure 2 :
Figure 2: Results for the search string 留守番 in Denshi jisho.

Figure 3 :
Figure 3: Results for the search string 留守番 in Tangorin.

Figure 4 :
Figure 4: Results for the search string 留守番 in PopJisyo.

Table 1 :
On-line resources used by Japanese language students at the University of Ljubljana.