Omogočanje dostopa do korpusov slovenskih spletnih besedil v luči pravnih omejitev
DOI:
https://doi.org/10.4312/slo2.0.2016.2.189-219Ključne besede:
spletna besedila, diseminacija korpusov, avtorske pravice, varstvo osebnih podatkov, prosti in odprti dostopPovzetek
Spletna besedila postajajo vse bolj relevanten vir informacij, korpuse tovrstnih besedil pa potrebujemo pri korpusnojezikoslovnih raziskavah in razvoju jezikovnih tehnologij za sodobno slovenščino. Čeprav so spletna besedila neposredno dostopna in je njihov zajem preprostejši od tiskanih, je izdelava takšnih korpusov še vedno zapletena, draga in zamudna. Ključno je, da poskrbimo, da se podobni podatki ne zbirajo večkrat, zato je nujno omogočiti njihovo čim večjo dostopnost čim širši raziskovalni skupnosti in zainteresirani javnosti. Tehničnih in prostorskih ovir za to sicer ni, vendar pri gradnji korpusa naletimo na številne omejitve v okviru zaščite avtorskih pravic, varstva osebnih podatkov in pogojev uporabe ponudnikov spletnih storitev. V prispevku predstavljamo pravno in dejansko stanje na teh področjih, opravimo pregled sorodnih tujih in domačih praks ter na primeru korpusa spletne slovenščine Janes predlagamo vrsto ukrepov, ki do največje možne mere omogočajo prosto in odprto razširjanje korpusov spletne slovenščine.Prenosi
Podatki o prenosih še niso na voljo.
Literatura
Al-Sulaiti, L.; Atwell, E. (2004): Designing and developing a corpus of contemporary Arabic. Zbornik šeste konference TALC.
Baroni, M.; Bernardini, S.; Ferraresi, A.; Zanchetta, E. (2009): The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43/3. 209–226.
Beißwenger, M.; Ermakova, M.; Geyken, A.; Lemnitzer, L.; Storrer, A. (2012b): DeRiK: A German Reference Corpus of Computer-Mediated Communication. Zbornik konference Digital Humanities 2012. Alliance of Digital Humanities Organizations (ADHO).
Beißwenger, M.; Ermakova, M.; Geyken, A.; Lemnitzer, L.; Storrer, A. (2012b): A TEI Schema for the Representation of Computer-mediated Communication. V: Journal of the Text Encoding Initiative, Issue 3.
Beißwenger, M.; Storrer, A. (2008): Corpora of computer-mediated communication. V: A. Lüdeling and M. Kytö (ur.). Corpus linguistics: An international handbook. Vol. 1, 292–309. Berlin and New York: Walter de Gruyter.
Beurskens, M. (2014): Legal Questions of Twitter Research V: V: Weller, K.; Bruns, A.; Burgess, J.; Mahrt, M.; Puschmann, C.: Twitter and Society. Peter Lang.
Beurskens, M. (2014): Legal Questions of Twitter Research. V: Weller, K.; Bruns, A.; Burgess, J.; Mahrt, M.; Puschmann, C.: Twitter and Society. Peter Lang.
Corti, L.; Day, A.; Backhouse, G. (2000): Confidentiality and Informed Consent: Issues for Consideration in the Preservation of and Provision of Access to Qualitative Data Archives. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research 1/3. http://www.qualitative-research.net/index.php/fqs/article/view/1024/2207
Čibej, J.; Arhar Holdt, Š.; Erjavec, T.; Fišer, D. (2016): Razvoj učne množice za izboljšano označevanje spletnih besedil. Zbornik konference Jezikovne tehnologije in digitalna humanistika.
Čibej, J.; Fišer, D.; Erjavec, T.; Arhar Holdt, Š. (2016): Razvoj učne množice za izboljšano označevanje spletnih besedil. JTDH 2016.
Dann, S. (2010): Twitter content classification First Monday, Volume 15, Number 12 http://firstmonday.org/ojs/index.php/fm/article/view/2745/2681
Dürscheid, C. (2015): Interaktionsräume ohne Grenzen? Texte in den neuen Medien. V: Dalmas, Martine idr. (ur.): Texte im Spannungsfeld von medialen Spielräumen und Normorientierung. Pisaner Fachtagung 2014 zu interkulturellen Perspektiven der internationalen Germanistik. München: Iudicum, 74–88.
Erjavec, T. (2013): Korpusi in konkordančniki na strežniku nl.ijs.si. Slovenščina 2.0, ISSN 2335-2736, letn. 1, št. 1, str. 24-49. http://www.trojina.org/slovenscina2.0/arhiv/2013/1/Slo2.0_2013_1_03.pdf.
Erjavec, T.; Čibej, J.; Fišer, D. (2015): Pravna podlaga za zagotavljanje prostega dostopa korpusov spletnih besedil. Smolej, M. (ur.). OBDOBJA 34: Slovnica in slovar – aktualni jezikovni opis. Ljubljana: Znanstvena založba Filozofske fakultete, 193–199.
Erjavec, T.; Javorše., J.; Krek, S. (2014): Raziskovalna infrastruktura CLARIN.SI. Zbornik Devete konference Jezikovne tehnologije. Ljubljana: Institut »Jožef Stefan«. 19–24.
Evropska komisija (2006): Evropska listina za raziskovalce. Kodeks ravnanja pri zaposlovanju raziskovalcev. http://ec.europa.eu/euraxess/pdf/brochure_rights/kina21620b7c_si.pdf
Evropska komisija (2012): Towards better access to scientific information: Boosting the benefits of public investments in research. Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions. https://ec.europa.eu/research/science-society/document_library/pdf_06/era-communica- tion-towards-better-access-to-scientific-information_en.pdf
Fišer, D., Erjavec, T., Ljubešić, N. (2016): JANES v0.4: Korpus slovenskih spletnih uporabniških vsebin. Slovenščina 2.0, 4 (2): 67–100.
Glaznieks, A.; Stemle, E. (2014): Challenges of building a CMC corpus for analyzing writer’s style by age: The DiDi project. Journal for Language Technology and Computational Linguistics 29/2. 31–57.
Goli, T.; Osrajnik, E.; Fišer, D. (2016): Analiza krajšanja slovenskih sporočil na družbenem omrežju Twitter. Zbornik konference Jezikovne tehnologije in digitalna humanistika.
Guevara, E.; Johannessen, J. (2014): NoWaC (Norwegian Web as Corpus), LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11372/LRT-343.
Halacsy, P. (2014): Hungarian Web Corpus, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11372/LRT-348.
Hemming, C.; Lassi, M. (2002): Copyright and the web as corpus. http://hemming.se/gslt/copyrightHemmingLassi.pdf
Hladnik, M. (2016): Nova pisarija. WikiKnjige. https://sl.wikibooks.org/wiki/Nova_pisarija
King, B. (2009): Building and analysing corpora of computer-mediated communication. Contemporary corpus linguistics, 301-320.
Kotar, M. (2013): Odprti dostop v Evropski uniji in v Sloveniji. Knjižničarske novice 23/10. http://www.nuk.uni-lj.si/knjiznicarskenovice/v2/podrobnostClanek.aspx?id=778
Kupietz, M.; Lüngen, H. (2014): Recent Developments in DeReKo. Language Resources and Evaluation 43/3. 209–226.
Lee, C.; Woods, K. (2012): Automated Redaction of Private and Personal Data in Collections: Toward Responsible Stewardship of Digital Heritage. The Memory of the World in the Digital age: Digitization and Preservation, 2012. Vancouver, BC.
Lessig, L. (1999): Code and other laws of cyberspace. New York, NY: Basic Books.
Longhi, J.; Marinica, C.; Borzic, B.; Alkhouli, A. (2014): Polititweets : corpus de tweets provenant de comptes politiques influents 1. In Chanier T. (ed) Banque de corpus CoMeRe. Ortolang.fr: Nancy. http://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-tei-v1
Majliš, M. (2011): W2C – Web to Corpus – Corpora, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11858/00-097C-0000-0022-6133-9.
Margaretha, E.; Lüngen, H. (2014): Building Linguistic Corpora from Wikipedia Articles and Discussions. JLCL, 29(2), 59-82.
Medlock, B. (2006): An introduction to NLP-based textual anonymisation. Zbornik pete mednarodne konference Language Resources and Evaluation (LREC).
Močnik, M.; Bogataj Jančič, M.; Kovačič, M.; Milohnić, A. (2008): Upravljanje avtorskih in sorodnih pravic v digitalnem okolju. Končno poročilo raziskovalnega projekta. http://www.uil-sipo.si/fileadmin/upload_folder/prispevki-mnenja/Raziskava_Upravljanje-ASP_2008.pdf
Olohan, M. (2004): Introducing corpora in translation studies. Routledge.
Olson, K. (2013): Intellectual Property. V: Stewart, Daxton (ur.). Social Media and the Law: A Guidebook for Communication Students and Professionals. New York: Routledge, 75-98.
Olutobi, O.; O’Connor, B.; Dyer, C.; Gimpel, K.; Schneider, N.; Smith, N. (2013): Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters. In Proceedings of NAACL 2013. http://www.cs.cmu.edu/~ark/TweetNLP/#pos
Östling, R.; Wirén, M. (2013): Compounding in a Swedish Blog Corpus. Computer mediated discourse across language. Stockholm: Stockholm University. 45–63.
Owoputi, O.; O'Connor, B.; Dyer, C.; Gimpel, K.; Schneider, N.; Smith, N. A. (2013): Improved part-of-speech tagging for online conversational text with word clusters. Association for Computational Linguistics.
Petrovič, S.; Osborne, M.; Lavrenko; V. (2010): The Edinburgh Twitter Corpus. Zbornik konference NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media. Los Angeles: Association for Computational Linguistics. 25–26.
Popič, D.; Fišer, D.; Zupan, K.; Logar, P. (2016): Raba vejice v uporabniških spletnih vsebinah. Zbornik konference Jezikovne tehnologije in digitalna humanistika.
Puschmann, C.; Burgess, J. (2014): The Politics of Twitter Data V: Weller, K.; Bruns, A.; Burgess, J.; Mahrt, M.; Puschmann, C.: Twitter and Society. Peter Lang.
Schäfer, R.; Bildhauer, F. (2012): Building Large Corpora from the Web Using a New Efficient Tool Chain. Zbornik konference Eighth International Conference on Language Resources and Evaluation (LREC’12).
Sodba Sodišča z dne 13. maja 2014 v zadevi C-131/12. http://curia.europa.eu/juris/document/docu-ment.jsf?text=&docid=152065&pageIndex=0&doclang=sl&mode=lst&dir=&occ=first&part=1&cid=276332
Spooren, W.; van Charldorp, T. (2014): Challenges and experiences in collecting a chat corpus. Journal for Language Technology and Computational Linguistics 29/2. 1–15.
Spousta, M. (2006): Web as a Corpus. Zbornik konference WDS’06. Praga: Matfyzpress. 179–184.
Štebe, J; Bezjak, S.; Lužar, S. (2013): Odprti podatki: načrt za vzpostavitev sistema odprtega dostopa do raziskovalnih podatkov v Sloveniji. Ljubljana: FDV.
Teutsch, P.; Piat, F.; Reffay, C. (2009): Anonymizing and sharing corpora of online training courses. Zbornik konference Interaction Analysis and Visualization for Asynchronous Communication, Workshop CSCL’2009. International Society of the Learning Sciences. 1–6.
Twitter (2016a). Terms of service. http://twitter.com/tos
Twitter (2016b): Developer Display Requirements https://dev.twitter.com/overview/terms/agreement-and-policy
Twitter (2016c): Developer Rules of the Road https://dev.twitter.com/overview/terms/agreement-and-policy
Twitter (2016d): Privacy Policy https://twitter.com/privacy
Vintar, Š.; Fišer, D. (2009): Gradnja in analiza korpusov za prevodoslovne raziskave. V: Kocijančič-Pokorn, Nike (ur.). Sodobne metode v prevodoslovnem raziskovanju, (Zbirka Prevodoslovje in uporabno jezikoslovje). Ljubljana: Znanstvena založba Filozofske fakultete, 2009, str. 80-109.
Wiki Books. Legal framework of textual data processing for Machine Translation and Language Technology research and development activities/Open Data and Web crawling Case Studies. https://en.wikibooks.org/wiki/Legal_framework_of_textual_data_processing_for_Machine_Translation_and_Language_Technology_research_and_development_activities/Open_Data_and_Web_crawling_Case_Studies
Yang, J.; Leskovec, J. (2011): Temporal Variation in Online Media. ACM International Conference on Web Search and Data Mining (WSDM '11). http://snap.stanford.edu/data/twitter7.html
Baroni, M.; Bernardini, S.; Ferraresi, A.; Zanchetta, E. (2009): The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43/3. 209–226.
Beißwenger, M.; Ermakova, M.; Geyken, A.; Lemnitzer, L.; Storrer, A. (2012b): DeRiK: A German Reference Corpus of Computer-Mediated Communication. Zbornik konference Digital Humanities 2012. Alliance of Digital Humanities Organizations (ADHO).
Beißwenger, M.; Ermakova, M.; Geyken, A.; Lemnitzer, L.; Storrer, A. (2012b): A TEI Schema for the Representation of Computer-mediated Communication. V: Journal of the Text Encoding Initiative, Issue 3.
Beißwenger, M.; Storrer, A. (2008): Corpora of computer-mediated communication. V: A. Lüdeling and M. Kytö (ur.). Corpus linguistics: An international handbook. Vol. 1, 292–309. Berlin and New York: Walter de Gruyter.
Beurskens, M. (2014): Legal Questions of Twitter Research V: V: Weller, K.; Bruns, A.; Burgess, J.; Mahrt, M.; Puschmann, C.: Twitter and Society. Peter Lang.
Beurskens, M. (2014): Legal Questions of Twitter Research. V: Weller, K.; Bruns, A.; Burgess, J.; Mahrt, M.; Puschmann, C.: Twitter and Society. Peter Lang.
Corti, L.; Day, A.; Backhouse, G. (2000): Confidentiality and Informed Consent: Issues for Consideration in the Preservation of and Provision of Access to Qualitative Data Archives. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research 1/3. http://www.qualitative-research.net/index.php/fqs/article/view/1024/2207
Čibej, J.; Arhar Holdt, Š.; Erjavec, T.; Fišer, D. (2016): Razvoj učne množice za izboljšano označevanje spletnih besedil. Zbornik konference Jezikovne tehnologije in digitalna humanistika.
Čibej, J.; Fišer, D.; Erjavec, T.; Arhar Holdt, Š. (2016): Razvoj učne množice za izboljšano označevanje spletnih besedil. JTDH 2016.
Dann, S. (2010): Twitter content classification First Monday, Volume 15, Number 12 http://firstmonday.org/ojs/index.php/fm/article/view/2745/2681
Dürscheid, C. (2015): Interaktionsräume ohne Grenzen? Texte in den neuen Medien. V: Dalmas, Martine idr. (ur.): Texte im Spannungsfeld von medialen Spielräumen und Normorientierung. Pisaner Fachtagung 2014 zu interkulturellen Perspektiven der internationalen Germanistik. München: Iudicum, 74–88.
Erjavec, T. (2013): Korpusi in konkordančniki na strežniku nl.ijs.si. Slovenščina 2.0, ISSN 2335-2736, letn. 1, št. 1, str. 24-49. http://www.trojina.org/slovenscina2.0/arhiv/2013/1/Slo2.0_2013_1_03.pdf.
Erjavec, T.; Čibej, J.; Fišer, D. (2015): Pravna podlaga za zagotavljanje prostega dostopa korpusov spletnih besedil. Smolej, M. (ur.). OBDOBJA 34: Slovnica in slovar – aktualni jezikovni opis. Ljubljana: Znanstvena založba Filozofske fakultete, 193–199.
Erjavec, T.; Javorše., J.; Krek, S. (2014): Raziskovalna infrastruktura CLARIN.SI. Zbornik Devete konference Jezikovne tehnologije. Ljubljana: Institut »Jožef Stefan«. 19–24.
Evropska komisija (2006): Evropska listina za raziskovalce. Kodeks ravnanja pri zaposlovanju raziskovalcev. http://ec.europa.eu/euraxess/pdf/brochure_rights/kina21620b7c_si.pdf
Evropska komisija (2012): Towards better access to scientific information: Boosting the benefits of public investments in research. Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions. https://ec.europa.eu/research/science-society/document_library/pdf_06/era-communica- tion-towards-better-access-to-scientific-information_en.pdf
Fišer, D., Erjavec, T., Ljubešić, N. (2016): JANES v0.4: Korpus slovenskih spletnih uporabniških vsebin. Slovenščina 2.0, 4 (2): 67–100.
Glaznieks, A.; Stemle, E. (2014): Challenges of building a CMC corpus for analyzing writer’s style by age: The DiDi project. Journal for Language Technology and Computational Linguistics 29/2. 31–57.
Goli, T.; Osrajnik, E.; Fišer, D. (2016): Analiza krajšanja slovenskih sporočil na družbenem omrežju Twitter. Zbornik konference Jezikovne tehnologije in digitalna humanistika.
Guevara, E.; Johannessen, J. (2014): NoWaC (Norwegian Web as Corpus), LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11372/LRT-343.
Halacsy, P. (2014): Hungarian Web Corpus, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11372/LRT-348.
Hemming, C.; Lassi, M. (2002): Copyright and the web as corpus. http://hemming.se/gslt/copyrightHemmingLassi.pdf
Hladnik, M. (2016): Nova pisarija. WikiKnjige. https://sl.wikibooks.org/wiki/Nova_pisarija
King, B. (2009): Building and analysing corpora of computer-mediated communication. Contemporary corpus linguistics, 301-320.
Kotar, M. (2013): Odprti dostop v Evropski uniji in v Sloveniji. Knjižničarske novice 23/10. http://www.nuk.uni-lj.si/knjiznicarskenovice/v2/podrobnostClanek.aspx?id=778
Kupietz, M.; Lüngen, H. (2014): Recent Developments in DeReKo. Language Resources and Evaluation 43/3. 209–226.
Lee, C.; Woods, K. (2012): Automated Redaction of Private and Personal Data in Collections: Toward Responsible Stewardship of Digital Heritage. The Memory of the World in the Digital age: Digitization and Preservation, 2012. Vancouver, BC.
Lessig, L. (1999): Code and other laws of cyberspace. New York, NY: Basic Books.
Longhi, J.; Marinica, C.; Borzic, B.; Alkhouli, A. (2014): Polititweets : corpus de tweets provenant de comptes politiques influents 1. In Chanier T. (ed) Banque de corpus CoMeRe. Ortolang.fr: Nancy. http://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-tei-v1
Majliš, M. (2011): W2C – Web to Corpus – Corpora, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11858/00-097C-0000-0022-6133-9.
Margaretha, E.; Lüngen, H. (2014): Building Linguistic Corpora from Wikipedia Articles and Discussions. JLCL, 29(2), 59-82.
Medlock, B. (2006): An introduction to NLP-based textual anonymisation. Zbornik pete mednarodne konference Language Resources and Evaluation (LREC).
Močnik, M.; Bogataj Jančič, M.; Kovačič, M.; Milohnić, A. (2008): Upravljanje avtorskih in sorodnih pravic v digitalnem okolju. Končno poročilo raziskovalnega projekta. http://www.uil-sipo.si/fileadmin/upload_folder/prispevki-mnenja/Raziskava_Upravljanje-ASP_2008.pdf
Olohan, M. (2004): Introducing corpora in translation studies. Routledge.
Olson, K. (2013): Intellectual Property. V: Stewart, Daxton (ur.). Social Media and the Law: A Guidebook for Communication Students and Professionals. New York: Routledge, 75-98.
Olutobi, O.; O’Connor, B.; Dyer, C.; Gimpel, K.; Schneider, N.; Smith, N. (2013): Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters. In Proceedings of NAACL 2013. http://www.cs.cmu.edu/~ark/TweetNLP/#pos
Östling, R.; Wirén, M. (2013): Compounding in a Swedish Blog Corpus. Computer mediated discourse across language. Stockholm: Stockholm University. 45–63.
Owoputi, O.; O'Connor, B.; Dyer, C.; Gimpel, K.; Schneider, N.; Smith, N. A. (2013): Improved part-of-speech tagging for online conversational text with word clusters. Association for Computational Linguistics.
Petrovič, S.; Osborne, M.; Lavrenko; V. (2010): The Edinburgh Twitter Corpus. Zbornik konference NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media. Los Angeles: Association for Computational Linguistics. 25–26.
Popič, D.; Fišer, D.; Zupan, K.; Logar, P. (2016): Raba vejice v uporabniških spletnih vsebinah. Zbornik konference Jezikovne tehnologije in digitalna humanistika.
Puschmann, C.; Burgess, J. (2014): The Politics of Twitter Data V: Weller, K.; Bruns, A.; Burgess, J.; Mahrt, M.; Puschmann, C.: Twitter and Society. Peter Lang.
Schäfer, R.; Bildhauer, F. (2012): Building Large Corpora from the Web Using a New Efficient Tool Chain. Zbornik konference Eighth International Conference on Language Resources and Evaluation (LREC’12).
Sodba Sodišča z dne 13. maja 2014 v zadevi C-131/12. http://curia.europa.eu/juris/document/docu-ment.jsf?text=&docid=152065&pageIndex=0&doclang=sl&mode=lst&dir=&occ=first&part=1&cid=276332
Spooren, W.; van Charldorp, T. (2014): Challenges and experiences in collecting a chat corpus. Journal for Language Technology and Computational Linguistics 29/2. 1–15.
Spousta, M. (2006): Web as a Corpus. Zbornik konference WDS’06. Praga: Matfyzpress. 179–184.
Štebe, J; Bezjak, S.; Lužar, S. (2013): Odprti podatki: načrt za vzpostavitev sistema odprtega dostopa do raziskovalnih podatkov v Sloveniji. Ljubljana: FDV.
Teutsch, P.; Piat, F.; Reffay, C. (2009): Anonymizing and sharing corpora of online training courses. Zbornik konference Interaction Analysis and Visualization for Asynchronous Communication, Workshop CSCL’2009. International Society of the Learning Sciences. 1–6.
Twitter (2016a). Terms of service. http://twitter.com/tos
Twitter (2016b): Developer Display Requirements https://dev.twitter.com/overview/terms/agreement-and-policy
Twitter (2016c): Developer Rules of the Road https://dev.twitter.com/overview/terms/agreement-and-policy
Twitter (2016d): Privacy Policy https://twitter.com/privacy
Vintar, Š.; Fišer, D. (2009): Gradnja in analiza korpusov za prevodoslovne raziskave. V: Kocijančič-Pokorn, Nike (ur.). Sodobne metode v prevodoslovnem raziskovanju, (Zbirka Prevodoslovje in uporabno jezikoslovje). Ljubljana: Znanstvena založba Filozofske fakultete, 2009, str. 80-109.
Wiki Books. Legal framework of textual data processing for Machine Translation and Language Technology research and development activities/Open Data and Web crawling Case Studies. https://en.wikibooks.org/wiki/Legal_framework_of_textual_data_processing_for_Machine_Translation_and_Language_Technology_research_and_development_activities/Open_Data_and_Web_crawling_Case_Studies
Yang, J.; Leskovec, J. (2011): Temporal Variation in Online Media. ACM International Conference on Web Search and Data Mining (WSDM '11). http://snap.stanford.edu/data/twitter7.html
Prenosi
Objavljeno
27. 09. 2016
Številka
Rubrika
Razprave
Licenca
Avtorske pravice (c) 2016 Tomaž Erjavec, Jaka Čibej, Darja Fišer

To delo je licencirano pod Creative Commons Priznanje avtorstva-Deljenje pod enakimi pogoji 4.0 mednarodno licenco.
Kako citirati
Erjavec, T., Čibej, J., & Fišer, D. (2016). Omogočanje dostopa do korpusov slovenskih spletnih besedil v luči pravnih omejitev. Slovenščina 2.0: Empirične, Aplikativne in Interdisciplinarne Raziskave, 4(2), 189-219. https://doi.org/10.4312/slo2.0.2016.2.189-219