Emotion analysis in socially unacceptable discourse
Keywords:emotions, socially unacceptable discourse (SUD), hate speech, social media, corpora
Texts often express the writer’s emotional state, and it was shown that emotion information has potential for hate speech detection and analysis. In this work, we present a methodology for quantitative analysis of emotion in text. We define a simple, yet effective metric for an overall emotional charge of text based on the NRC Emotion Lexicon and Plutchik’s eight basic emotions. Using this methodology, we investigate the emotional charge of content with socially unacceptable discourse (SUD), as a distinct and potentially harmful type of text which is spreading on social media. We experiment with the proposed method on a corpus of Facebook comments, resulting in four datasets in two languages, namely English and Slovene, and two discussion topics, LGBT+ rights, and the European Migrants crisis. We reveal that SUD content is significantly more emotional than non-SUD comments. Moreover, we show differences in the expression of emotions depending on the language, topic, and target of the comments. Finally, to underpin the findings of the quantitative investigation of emotions, we perform a qualitative analysis of the corpus, exploring in more detail the most frequent emotional words of each emotion, for all four datasets. The qualitative analysis shows that the source of emotions in SUD texts heavily depends on the topic of discussion, with substantial overlaps between languages.
Alm, C., Roth, D., & Sproat, R. (2005). Emotions from Text: Machine Learning for Text-based Emotion Prediction. Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, October 2005, Vancouver, Canada (pp. 579–586). Association for Computational Linguistics. doi:10.3115/1220575.1220648 DOI: https://doi.org/10.3115/1220575.1220648
Al-Saqqa, S., Abdel-Nabi, H., & Awajan, A. (2018). A survey of textual emotion detection. 8th International Conference on Computer Science and Information Technology (CSIT), July 2018 (pp. 136–142). doi: 10.1109/CSIT.2018.8486405 DOI: https://doi.org/10.1109/CSIT.2018.8486405
Aman, S., & Szpakowicz, S. (2007). Identifying Expressions of Emotion in Text. In V. Matoušek & P. Mautner (Eds.), Text, Speech and Dialogue, SD 2007. Lecture Notes in Computer Science (Vol. 4629) (pp. 196–205). Berlin, Heidelberg: Springer. DOI: https://doi.org/10.1007/978-3-540-74628-7_27
Assimakopoulos, S., Baider, F. H., & Millar, S. (2017). Online Hate Speech in the European Union. A Discourse-Analytic Perspective. Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-72604-5
Brindle, A. (2016). The Language of Hate. A Corpus Linguistic Analysis of White Supremacist Language. London and New York: Routledge. DOI: https://doi.org/10.4324/9781315731643
Canales, L., Daelemans, W., Boldrini, E., & Martinez-Barco, P. (2019). EmoLabel: Semi-Automatic Methodology for Emotion Annotation of Social Media Text. IEEE Transactions on Affective Computing. Retrieved from https:// ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8758380
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Rout¬ledge.
Daelemans, W., Fišer, D., Franza, J., Kranjčić, D., Lemmens, J., Ljubešić, N., Markov, I., & Popič, D. (2020). The LiLaH Emotion Lexicon of Croatian, Dutch and Slovene. Slovenian language resource repository CLARIN.SI. https://www.clarin.si/repository/xmlui/handle/11356/1318
Denecke, K. (2008). Using SentiWordNet for Multilingual Sentiment Analysis. Proceedings of the 24th International Conference on Data Engineering, 7–12 April 2008, Cancun, Mexico (pp. 507–512). DOI: https://doi.org/10.1109/ICDEW.2008.4498370
Fišer, D., Ljubešić, N., & Erjavec, T. (2017). Legal framework, dataset and annotation schema for socially unacceptable online discourse practices in Slovene. Proceedings of the 1st Workshop on Abusive Language Online, ACL 2017, Vancouver, Canada (pp. 46–51). Association for Computational Linguistics. doi: 10.18653/v1/W17-3007 DOI: https://doi.org/10.18653/v1/W17-3007
Franza, J., & Fišer, D. (2019). The lexical inventory of Slovene socially unacceptable discourse on Facebook. Proceedings of the 7th Conference on Computer-Mediated Communication (CMC) and Social Media Corpora, CMC-Corpora 2019, Cergy-Pontoise, France. Retrieved from https://hal. archives-ouvertes.fr/hal-02292616/document#page=50
Ghazi, D. (2016). Identifying Expressions of Emotions and Their Stimuli in Text. PhD dissertation. Canada: University of Ottawa.
Gitari, N. D., Zuping, Z., Hanyurwimfura, D., & Long, J. (2015). A Lexicon-based Approach for Hate Speech Detection. International Journal of Multimedia and Ubiquitous Engineering (Vol. 10, No.4) (pp. 215–230). DOI: https://doi.org/10.14257/ijmue.2015.10.4.21
Knoblock, N. (2017). Xenophobic Trumpeters: A corpus-assisted discourse study of Donald Trump’s Facebook conversations. In A. Musolff (Ed.), Journal of Language Aggression and Conflict (Vol. 5, No.7) (pp. 295–322). Amsterdam/Philadelphia: John Benjamins Publishing Company. DOI: https://doi.org/10.1075/jlac.5.2.07kno
Ljubešić, N. (2019). The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian. Ljubljana: Slovenian language resource repository CLARIN.SI. Retrieved from http://hdl.handle.net/11356/1251
Ljubešić, N. (2020). The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.1, Slovenian language resource repository CLARIN. SI. http://hdl.handle.net/11356/1286
Ljubešić, N., Fišer, D., & Erjavec, T. (2019). The FRENK datasets of Socially Unacceptable Discourse in Slovene and English. International Conference on Text, Speech, and Dialogue. Springer, Cham. doi: 10.1007/978-3-030-27947-9_9 DOI: https://doi.org/10.1007/978-3-030-27947-9_9
Ljubešić, N., Fišer, D., Erjavec, T., & Šulc, A. (2021). Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.1. Ljubljana: Slovenian language resource repository CLARIN.SI. Retrieved from http://hdl.handle.net/11356/1462
Markov, I., Ljubešić, N., Fišer, D., & Daelemans, W. (2021). Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection. Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 149–159). Association for Computational Linguistics. Retrieved from https://aclanthology.org/2021.wassa-1.16/
Martins, R., Gomes, M., Almeida, J. J., Novais, P., & Henriques, P. (2018). Hate Speech Classification in Social Media Using Emotional Analysis. 7th Brazilian Conference on Intelligent Systems (BRACIS), 22–25 October 2018, Sao Paulo, Brazil (pp. 61–66). doi: 10.1109/BRACIS.2018.00019 DOI: https://doi.org/10.1109/BRACIS.2018.00019
Mohammad, S., & Yang T. (2011). Tracking Sentiment in Mail: How Genders Differ on Emotional Axes. Proceedings of the 2nd Workshop on Computa¬tional Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011) (pp. 70–79). Portland, Oregon: Association for Computational Linguistics.
Mohammad, S., & Turney, P. D. (2010). Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon. Pro¬ceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, June 2010, Los Angeles, California (pp. 26–34).
Pahor de Maiti, K., Fišer, D., & Ljubešić, N. (2019). How haters write: analysis of nonstandard language in online hate speech. Proceedings of the 7th Conference on Computer-Mediated Communication (CMC) and Social Media Corpora, CMC-Corpora, 9–10 September 2019, Cergy-Pontoise, France. Retrieved from https://hal.archives-ouvertes.fr/hal-02292616/document#page=44
Peng Q., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. Retrieved from https://arxiv.org/abs/2003.07082
Plutchik, R. (1980). Emotion: Theory, research and experience, 1. Academic Press.
Plutchik, R. (2001). The Nature of Emotions: Human Emotions Have Deep Evolutionary Roots, a Fact That May Explain Their Complexity and Provide Tools for Clinical Practice. American Scientist 89(4), 344–350. DOI: https://doi.org/10.1511/2001.4.344
Pratt, J. W., & Gibbons, J. D. (1981). Kolmogorov-Smirnov two-sample tests. Concepts of nonparametric theory. Springer, New York, NY. 318–344. DOI: https://doi.org/10.1007/978-1-4612-5931-2_7
Russell, J. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. doi: 10.1037/h0077714 DOI: https://doi.org/10.1037/h0077714
Scherer, K. R. (2005). What are emotions? And how can they be measured? Social Science Information, 44(4), 695–729. doi: 10.1177/05390184050582 DOI: https://doi.org/10.1177/0539018405058216
Vehovar, V., Povž, B., Fišer, D., Ljubešić, N., Šulc, A., & Jontes, D. (2020). Družbeno nesprejemljivi diskurz na Facebookovih straneh novičarskih portalov. Teorija in Praksa, 57(2), 622–645.
Zad, S., Jimenez, J., & Finlayson, M. A. (2021). Hell Hath No Fury? Correcting Bias in the NRC Emotion Lexicon. Proceedings of the 5th Workshop on Online Abuse and Harms, 6 August 2021, Bangkok, Thailand (pp. 102–111). Retrieved from https://aclanthology.org/2021.woah-1.pdf DOI: https://doi.org/10.18653/v1/2021.woah-1.11
How to Cite
Copyright (c) 2022 Jasmin Franza, Bojan Evkoski, Darja Fišer
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All content of Slovenščina 2.0 is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Slovenščina 2.0 applies the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license to all published material. Under this license, authors retain ownership of the copyright for their content, but allow anyone to download, reuse, reprint, modify, distribute, copy, remix, transform and/or build upon the content for any purpose, even commercial, as long as the original authors and source are cited. No permission is required from the authors or the publishers. Appropriate attribution can be provided by simply citing the original article. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. For any reuse or redistribution of a work, users must also make clear the license terms under which the work was published.
No separate publishing agreements are signed between the author and the publisher. Authors retain copyright and the publishing rights of their work without any restrictions.
Authors are permitted and encouraged to post the journal’s published version of the work online (e.g., in institutional repositories, on their own websites), with an acknowledgement of its initial publication in Slovenščina 2.0.