MEDJEZIČNO ISKANJE DOKUMENTOV
DOI:
https://doi.org/10.55741/knj.46.1-2.14007Ključne besede:
iskanje informacij, naravni jeziki, medjezično iskanjePovzetek
IzvlečekČlanek utemeljuje potrebo po razvoju medjezičnega iskanja (MI), relativno novega področja shranjevanja in iskanja informacij v večjezičnih tekstovnih zbirkah, definira njegove cilje in umeščenost med raziskovalnimi področji, ki se ukvarjajo z različnimi vidiki obravnave besedil v elektronski obliki. Kratkemu pregledu zgodovine sledi opis najpomembnejših metodoloških pristopov v MI (prevajanje dokumentov, prevajanje iskalnih zahtev) in jezikovnih virov, ki so pri tem v uporabi. Med viri je največ pozornosti posvečene dvo- in večjezičnim ontologijam (tezavrom, slovarjem, prevajalskim leksikonom in tezavrom kolokacij), korpusom, njihovi gradnji in uporabi pri eksperimentih MI. Članek poskuša predvsem ilustrirati pestrost metodologije področja in manj delovanje konkretnih sistemov. Stanje MI v Sloveniji in obstoj jezikovnih virov, primernih za vključevanje slovenskih besedil v medjezične sisteme, nista obravnavana, ker je to tematika, ki zahteva poseben pregled.
Prenosi
Literatura
Ballesteros, L., & Croft, B. (1996). Dictionary methods for cross-lingual information retrieval. V Proceedings of the 7th International DEXA Conference on Database and Expert Systems (str. 791-801). URL: http://citeseer.nj.nec.com/ballesteros96dictionary.html DOI: https://doi.org/10.1007/BFb0034731
Ballesteros, L., & Croft, W. B. (1997). Phrasal translation and query expansion techniques for cross-language information retrieval. V Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval. URL: http://www.cfar.umd.edu/~kanungo/cmsc828K/clara/p84-ballesteros.pdf DOI: https://doi.org/10.1145/258525.258540
Ballesteros, L., & Croft, W. B. (1998). Resolving ambiguity for cross-language retrieval. V C. J. Van Rijsbergen, W. B. Croft, A. Moffat, (Ur.), Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (str. 64-71). ACM Press. URL: http://citeseer.nj.nec.com/ballesteros98resolving.html DOI: https://doi.org/10.1145/290941.290958
Braschler, M., & Schaeuble, P. (1998). Multilingual information retrieval based on document alignment techniques. V C. Nikolau, C. Stephanidis (ur.), Lecture Notes in Computer Science. Second European Conference on Research and Advanced Technology for Digital Libraries ECDL98, Crete. DOI: https://doi.org/10.1007/3-540-49653-X_12
Brown, P., Pietra, S. D., Pietra, V. D., & Mercer, R. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19 (2), 263-311.
Brown, R. D. (1997). Corpus-based query translation for translingual information retrieval. Position paper for SIGIR-97 workshop on Cross-Lingual Information Retrieval. URL: http://www.cs.cmu.edu/~ralf/papers/querytrans.ps
Carbonell, J. G., Yang, Y., Frederking, R. E., Brown, R. D., Geng, Y., Lee, D. (1997). Translingual information retrieval: a comparative evaluation. V Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence. URL: http://citeseer.nj.nec.com/carbonell97translingual.html
Davis, M., & Dunning, T. (1995). A TREC evaluation of query translation methods for multi-lingual text retrieval. V Harman DK (ur.) The 4th Text Retrieval Conference (TREC-4). NIST. URL: http://trec.nist.gov/pubs/trec4/papers/nmsu.ps.gz
Eichmann, D., Ruiz, M. E., & Srinivasan, P. (1998). Cross-language information retrieval with the UMLS metathesaurus. V Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (str. 72-80). URL: http://citeseer.nj.nec.com/218119.html 10. Gilarranz, J., Gonzalo, J., & Verdejo, F. (1997). Language-independent text retrieval with the EuroWordNet multilingual semantic database. V Second Workshop on Multilinguality in the Software Industry: The AI Contribution. URL: http://sensei.ieec.uned.es/NLP/papers/mulsaic97.ps DOI: https://doi.org/10.1145/290941.290959
Haddouti, H. (1999). Survey: multilingual text retrieval and access. Working notes of the AAAI Symposium on Cross Manguage Text and Speech Retrieval. URL: http://www.forwiss.tu-muenchen.de/~haddouti/survey.ps
Hovy, E., Ide, N., Frederking, R., Mariani, J., & Zampolli, A. (ur.). (1999). Multilingual information management: current levels and future abilities. A report Commissioned by the US National Science Foundation and also delivered to the European Commission’s Language Engineering Office and the US Defense Advanced Research Projects Agency, Chapter 2. URL: http://www.cs.cmu.edu/~ref/mlim/index.html
Hull, D. A.,& Grefenstette, G. (1996). Querying across languages: A dictionary-based approach to multilingual information retrieval. V Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval. DOI: https://doi.org/10.1145/243199.243212
Lin, D. (1998). An information-theoretic definition of similarity. V Fifteenth international conference of machine learning ICML-98. Madison, USA. URL: ftp://ftp.cs.umanitoba.ca/pub/lindek/papers/sim.ps.gz
Maeda, A., Sadat, F., Yoshikawa, M., & Uemura, S. (2000). Query term disambiguation for web cross-language information retrieval using a search engine. V Proceedings of the 5th International Workshop Information Retrieval with Asian Languages. URL: http://db-www.aist-nara.ac.jp/~aki-mae/pub/IRAL00-e.pdf DOI: https://doi.org/10.1145/355214.355218
Melamed, I. D. (1996a). Automatic construction of clean broad-coverage translation lexicons. V Proceedings of the 2nd Conference of the Association for machine translation in the Americas. Montreal. URL: ftp://ftp.cis.upenn.edu/pub/melamed/papers/amta96.ps.gz
Melamed, I.D. (1996b). A geometric approach to mapping bitext correspondence. V First Conference on Empirical Methods in Natural Language Processing (EMNLP’96), Philadelphia, USA. URL: ftp://ftp.cis.upenn.edu/pub/melamed/papers/emnlp96.ps.gz
Melamed, I. D. (1997) A scalable architecture for bilingual lexicography. Dept. of Computer and Information Science Technical Report #MS-CIS-91-01. URL: ftp://ftp.cis.upenn.edu/pub/melamed/papers/sabletr.ps.gz
Miller, G. A., Beckwith, R, Fellbaum, C., Gross, D., & Miller, K. (1993). Introduction to WordNet: An on-line lexical database. V Five Papers on WordNet. URL: ftp://ftp.cogsci.princeton.edu/pub/wordnet/5papers.pdf
Nie, J.-Y., Simard, M., Isabelle, P., & Durand, R. (1999). Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. V Proceedings of the 22th ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, USA. URL: http://www.xrce.xerox.com/people/isabelle/publications/sigir99.ps DOI: https://doi.org/10.1145/312624.312656
Oard, D. W. (1997a) Cross-language text retrieval research in the USA. V The 3rd ERCIM DELOS Workshop, Zurich. URL: http://www.clis.umd.edu/dlrg/filter/papers/delos.ps
Oard, D. W. (1997b). Cross-language information retrieval. SIGIR-97 tutorial. URL: http://www.clis2.umd.edu/dlrg/filter/papers/tutnotes.ps
Oard, D. W., Dorr, B. J. (1996). A survey of multilingual text retrieval. Technical Report UMIACS-TR-96-19. University of Maryland. URL: ftp://ftp.cs.umd.edu/pub/papers/papers/ncstrl.umcp/CS-TR-3615/CS-TR-3615.ps.Z24. Oard, D. W., Dorr, B. J., Hackett, P. G., & Katsova, M. (1998). A comparative study of knowledge-based approaches for cross-language information retrieval. Technical Report CLIS-TR-98-01. University of Maryland. URL: ftp://ftp.cs.umd.edu/pub/papers/papers/ncstrl.umcp/CS-TR-3897/CS-TR-3897.ps.Z
Pevzner, B. R. (1972). Comparative evaluation of the operation of the Russian and English variants of the »Pusto-Nepusto-2« system. Automatic Documentation and Mathematical Linguistics, 6 (2), 71-74.
Pirkola, A. (1998). The effects of query-structure and dictionary setups in dictionary-based cross-language information retrieval. V Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (str. 55-63). DOI: https://doi.org/10.1145/290941.290957
Qiu, Y., & Frei, H. P. (1993.) Concept based query expansion. V Proceedings of the 16th ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh (str. 160-169). URL: http://citeseer.nj.nec.com/qiu93concept.html DOI: https://doi.org/10.1145/160688.160713
Radwan, K. (1994). Vers l’Acces Multilingue en Langage Naturel aux Bases de Donnes Textuelles. PhD thesis. Paris: Universite de Paris-Sud.
Resnik, P., & Melamed, I. D. (1997). Semi-automatic acquisition of domainspecific translation lexicons. V Proceedings of the 7th ACL Conference on Applied Natural Language Processing, Washington, DC. URL: http://citeseer.nj.nec.com/42076.html DOI: https://doi.org/10.3115/974557.974607
Rocchio, J. J. (1971). Relevance feedback in information retrieval. V Salton G (ur.), The SMART retrieval system (str. 313-323). Englewood Cliffs: Prentice Hall.
Salton, G. (1970). Automatic processing of foreign language documents. Journal of the American Society for Information Science, 21 (3), 187-194. DOI: https://doi.org/10.1002/asi.4630210305
Salton, G. (1973). Experiments in multi-lingual information retrieval. Information processing letters, 2 (1), 6-11. TR 72-154. URL: http://cs-tr.cs.cornell.edu/Dienst/UI/1.0/Display/ncstrl.cornell/TR72-154 DOI: https://doi.org/10.1016/0020-0190(73)90017-3
Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing & Management, 24 (5), 513-523. DOI: https://doi.org/10.1016/0306-4573(88)90021-0
Schaeuble, P., & Smeaton, A. F. (1998). An international research agenda for digital libraries. Summary report of the series of joint NSF-EU working groups on future directions for digital library research. URL: http://www.ercim.org/publication/ws-proceedings/DELOS-B/dl_sum_report.pdf
Sheridan, P., & Ballerini, J. P. (1996). Experiments in multilingual information retrieval using the Spider system. V Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval. URL: http://citeseer.nj.nec.com/sheridan96experiments.html DOI: https://doi.org/10.1145/243199.243213
Sheridan, P., & Schaeuble, P. (1997). Cross-language information retrieval in a multilingual legal domain. V First European Conference on Research and Advanced Technology for Digital Libraries. URL:// citeseer.nj.nec.com/sheridan97crosslanguage.html DOI: https://doi.org/10.1007/BFb0026732
Sheridan, P., Wechsler, M., & Schaeuble, P. (1997). Cross-language speech retrieval: establishing a baseline performance. V Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval. URL: http://citeseer.nj.nec.com/142488.html DOI: https://doi.org/10.1145/258525.258544
Sperer, R., & Oard, D. W. (2000). Structured translation for cross-language information retrieval. V Proceedings of the 23th ACM SIGIR Conference on Research and Development in Information Retrieval. Athens. URL: http://citeseer.nj.nec.com/298892.html DOI: https://doi.org/10.1145/345508.345562
UNESCO (1971). Guidelines for establishment and development of multilingual scientific and technical thesauri for information retrieval. Paris: UNESCO.
Vorhees E. (1994). Query expansion using lexical-semantic relations. V Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dublin (str. 61-69). DOI: https://doi.org/10.1007/978-1-4471-2099-5_7
Vossen, P. (1998). EuroWordNet: building a multilingual database with wordnets for European languages. The ELRA Newsletter, 3 (1), 7-10. URL: http://www.hum.uva.nl/~ewn/docs/ELRARTF.zip DOI: https://doi.org/10.1007/978-94-017-1491-4_1
Xu, J., & Croft, W. B. (1998). Corpus-based stemming using co-occurrence of word variants. ACM Transactions on Information Systems, 16 (1), 61-81. URL: citeseer.nj.nec.com/32742.html DOI: https://doi.org/10.1145/267954.267957
Yang, Y., Brown, R. D., Frederking, R. E., Carbonell, J. G., Geng, Y., & Lee, D. (1997). Bilingual corpus-based approaches to translingual information retrieval. V 2nd Workshop on Multilinguality in Software Industry: The AI Contribution (MULSAIC’97). Nagoya. DOI: https://doi.org/10.1016/S0004-3702(98)00063-0