Contrasting a semiotic conceptualization of translation with AI text production

The case of audio captioning


  • Riku Haapaniemi Tampere University, Finland
  • Annamaria Mesaros Tampere University, Finland
  • Manu Harju Tampere University, Finland
  • Irene Martín Morató Tampere University, Finland
  • Maija Hirvonen Tampere University, Finland



artificial intelligence, audio captioning, intersemiotic translation, natural language processing, semiotics


Using a semiotically-informed material approach to the study of translation, this paper analyses an artificial intelligence (AI) system developed for automatic audio captioning (AAC), which is the automated production of written descriptions for non-lingual environmental sounds. Comparing human and AI text production processes against a semiotic framework suggests that AI uses computational methods to reach textual outcomes which humans arrive at through semiotic means. Our analysis of sound description examples produced by an AAC system makes it apparent that this distinction is useful in articulating the complex relationship between human and AI translation processes. Acknowledging the central role of semiotic meaning-construction in human text production and its arguable absence in AI computational processes allows for AI processes to be discussed under a translational framework, while still recognizing their fundamental differences from comparable human translation processes. Further, audio captioning provides a clear example of a translation task where non-lingual content must be considered on equal terms with lingual text, and our discussions illustrate how this can be achieved in computational and semiotic processes alike. Overall, this paper promotes a nuanced understanding of meaning in text production and suggests multiple fruitful points of convergence and divergence between translation theory and AI research.


Download data is not yet available.


Aafaq, Nayyer, Ajmal Mian, Wei Liu, Syed Gilani, and Mubarak Shah. 2020. “Video Description: A Survey of Methods, Datasets, and Evaluation Metrics.” ACM computing surveys 52 (6): 1–37.

Asscher, Omri. 2022. “The Explanatory Power of Descriptive Translation Studies in the Machine Translation Era.” Perspectives (e-publication ahead of print): 1–17.

Asscher, Omri. 2023. “The Position of Machine Translation in Translation Studies: A Definitional Approach.” Translation Spaces 12 (2): 1–20.

Bahdanau, Dzmitry, KyungHyun Cho, and Yoshua Bengio. 2016. “Neural Machine Translation by Jointly Learning to Align and Translate.” arXiv 1409.0473: 1–15. Accessed May 16, 2024.

Bender, Emily M., and Alexander Koller. 2020. “Climbing towards NLU: On meaning, form, and understanding in the age of data.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, edited by Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, 5185–5198. Stroudsburg: Association for Computational Linguistics.

Bennett, Karen. 2022. “The unsustainable lightness of meaning: Reflections on the material turn in Translation Studies and its intradisciplinary implications.” In Recharting Territories: Intradisciplinarity in Translation Studies, edited by Gisele Dionísio da Silva and Maura Radicioni, 49–73. Leuven: Leuven University Press.

Bisk, Yonatan, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, et al. 2020. “Experience Grounds Language.” In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 8718–735. Association for Computational Linguistics (ACL).

Blumczynski, Piotr. 2023. Experiencing Translationality. Material and Metaphorical Journeys. London and New York: Routledge.

Campbell, Madeleine, and Ricarda Vidal, eds. 2024. The Experience of Translation: Materiality and Play in Experiential Translation. London and New York: Routledge.

De Deyne, Simon, Danielle J. Navarro, Guillem Collell, and Andrew Perfors. 2021. “Visual and Affective Multimodal Models of Word Meaning in Language and Mind.” Cognitive Science 45 (1): 1–44.

Do Carmo, Félix, Dorothy Kenny, and Mary Nurminen. 2022. “Is machine translation translation? Exploring conceptualizations of translation in a digitally saturated world.” Call for abstracts, special issue of Translation Spaces. Accessed April 12, 2024.

Drossos, Konstantinos, Sharath Adavanne, and Tuomas Virtanen. 2017. “Automated Audio Captioning with Recurrent Neural Networks.” In 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 374–78. The Institute of Electrical and Electronics Engineers.

Drossos, Konstantinos, Samuel Lipping, and Tuomas Virtanen. 2020. “Clotho: An Audio Captioning Dataset.” In 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings, 736–40. The Institute of Electrical and Electronics Engineers.

Elizalde, Benjamin, Soham Deshmukh, Mahmoud Al Ismail, and Huaming Wang. 2023. “CLAP Learning Audio Concepts from Natural Language Supervision.” In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. The Institute of Electrical and Electronics Engineers.

Geng, Xinyang, Hao Liu, Lisa Lee, Dale Schuurmans, Sergey Levine, and Pieter Abbeel. 2022. “Multimodal Masked Autoencoders Learn Transferable Representations.” arXiv 2205.14204 (preprint): 1–15.

Gontier, Félix, Romain Serizel, and Christophe Cerisara. 2021. “Automated audio captioning by fine-tuning BART with audioset tags.” In DCASE 2021 - 6th Workshop on Detection and Classification of Acoustic Scenes and Events, 170–74. Accessed May 14, 2024.

Gontier, Félix, Romain Serizel, and Christophe Cerisara. 2023. “Spice+: Evaluation of Automatic Audio Captioning Systems with Pre-Trained Language Models.” In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. The Institute of Electrical and Electronics Engineers.

Haapaniemi, Riku. 2023. “How production and distribution processes shape translations in organisations: A material perspective.” Translation Spaces 12 (1): 74–96.

Haapaniemi, Riku. 2024. “Translation as meaning-construction under co-textual and contextual constraints: A model for a material approach to translation.” Translation Studies 17 (1): 20–36.

Haapaniemi, Riku, and Emma Laakkonen. 2019. “The materiality of music: Interplay of lyrics and melody in song translation.” Translation Matters 1 (2): 62–75.

Hirvonen, Maija, and Betta Saari. 2024. “Scripted or spontaneous? Two approaches to audio describing visual art in museums.” Perspectives 32 (1): 76–99.

Hodosh, Micah, Peter Young, and Julia Hockenmaier. 2013. “Framing image description as a ranking task: data, models and evaluation metrics.” The Journal of Artificial Intelligence Research 47 (1): 853–99.

Iashin, Vladimir, and Esa Rahtu. 2020. “A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-Modal Transformer.” In 31st British Machine Vision Conference 2020, BMVC 2020, 1–16. BMVA Press.

Jakobson, Roman. 1959. “On Linguistic Aspects of Translation.” In On Translation, edited by Reuben Arthur Brower, 232–39. New York: Oxford University Press.

Jiménez-Crespo, Miguel A. 2023. ““Translationese” (and “post-editese”?) no more: on importing fuzzy conceptual tools from Translation Studies in MT research.” In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, edited by Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, et al., 261–68. Tampere: European Association for Machine Translation.

Kaindl, Klaus. 2013. “Multimodality and Translation.” In The Routledge Handbook of Translation Studies, edited by Carmen Millán and Francesca Bartrina, 257–69. London and New York: Routledge.

Kenny, Dorothy, Félix do Carmo, and Mary Nurminen. 2022. “Is Machine Translation Translation?” In EST Congress 2022, Abstracts: 396–417. Accessed May 14, 2024.

Ketola, Anne. 2018. Word-Image Interaction in Technical Translation: Students Translating an Illustrated Text. Tampere: Tampere University Press.

Kim, Chris Dongjoo, Byeongchang Kim, Hyunmin Lee, and Gunhee Kim. 2019. “Audiocaps: Generating Captions for Audios in the Wild.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), edited by Jill Burstein, Christy Doran, and Thamar Solorio, 119–132, Minneapolis, Minnesota. Association for Computational Linguistics. Accessed May 14, 2024.

Krüger, Ralph. 2022. “Some Translation Studies informed suggestions for further balancing methodologies for machine translation quality evaluation.” Translation Spaces 11 (2): 213–33.

Lake, Brenden M., and Gregory L. Murphy. 2021. “Word meanings in minds and machines.” Psychological Review 130 (2): 1–31.

Lenci, Alessandro, and Magnus Sahlgren. 2023. Distributional Semantics. Cambridge University Press.

Littau, Karin. 2016. “Translation and the Materialities of Communication.” Translation Studies 9 (1): 82–96.

Liu, Siqi, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, and Kevin Murphy. 2017. “Improved Image Captioning via Policy Gradient Optimization of SPIDEr.” In Proceedings 2017 IEEE International Conference on Computer Vision, 873–81. The Institute of Electrical and Electronics Engineers.

Mahowald, Kyle, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, and Evelina Fedorenko. 2023. “Dissociating language and thought in large language models: a cognitive perspective.” arXiv 2301.06627 (preprint): 1–45.

Marais, Kobus. 2019. A (Bio)Semiotic Theory of Translation: The Emergence of Social-Cultural Reality. New York and London: Routledge.

Martín-Morató, Irene, and Annamaria Mesaros. 2021. “Diversity and bias in audio captioning datasets.” In Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), edited by Frederic Font, Annamaria Mesaros, Daniel P.W. Ellis, Eduardo Fonseca, Magdalena Fuentes, and Benjamin Elizalde, 90–94.

Martín-Morató, Irene, Manu Harju, and Annamaria Mesaros. 2022. “A summarization approach to evaluating audio captioning.” In Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022), edited by Mathieu Lagrange, Annamaria Mesaros, Thomas Pellegrini, Gaël Richard, Romain Serizel, and Dan Stowell, 116–120. Accessed May 14, 2024.

Maszerowska, Anna, Anna Matamala, and Pilar Orero, eds. 2014. Audio Description. New Perspectives Illustrated. Amsterdam and Philadelphia: John Benjamins.

Mei, Xinhao, Xubo Liu, Mark D. Plumbley, and Wenwu Wang. 2022. “Automated audio captioning: an overview of recent progress and new challenges.” EURASIP Journal on Audio, Speech and Music Processing 2022 (1): 1–18.

Mogadala, Aditya, Marimuthu Kalimuthu, and Dietrich Klakow. 2019. “Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods.” The Journal of Artificial Intelligence Research 71: 1183–317.

Muñoz Martín, Ricardo, and Ana María Rojo López. 2018. “Meaning.” In The Routledge Handbook of Translation and Culture, edited by Sue-Ann Harding and Ovidi Carbonell Cortés, 61–78. London and New York: Routledge.

Nida, Eugene. 1964. Toward a Science of Translating: with Special Reference to Principles and Procedures Involved in Bible Translating. Leiden: Brill.

Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “BLEU: A Method for Automatic Evaluation of Machine Translation.” In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 311–18. Association for Computational Linguistics.

Peirce, Charles Sanders. 1994. The Collected Papers of Charles Sanders Peirce. Charlottesville: Intelex.

Pym, Anthony. 2010. Exploring Translation Theories. London and New York: Routledge.

Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, et al. 2021. “Learning Transferable Visual Models From Natural Language Supervision.” In Proceedings of Machine Learning Research 139, edited by Marina Meila and Tong Zhang, 8748–763. Accessed May 16, 2024.

Raley, Rita. 2022. “Translation ‘degree zero’.” In Time, Space, Matter in Translation, edited by Pamela Beattie, Simona Bertacco, and Tatjana Soldat-Jaffe, 33–38. London and New York: Routledge.

Remael, Aline, Nina Reviers, and Reinhild Vandkerckhove. 2016. “From Translation Studies and Audiovisual Translation to Media Accessibility: Some Research Trends.” Target 28 (2): 248–60.

Risku, Hanna, and Richard Pircher. 2008. “Visual Aspects of Intercultural Technical Communication: A Cognitive Scientific and Semiotic Point of View.” Meta 53 (1): 154–66.

Robinson, Douglas. 2016. Semiotranslating Peirce. Tartu: University of Tartu Press.

Robinson, Douglas. 2017. Translationality: Essays in the Translational-Medical Humanities. London and New York: Routledge.

Sealey, Allison. 2019. “Translation: A Biosemiotic/more-Than-Human Perspective.” Target 31 (3): 305–27.

Shannon, Claude E. 1948. “A Mathematical Theory of Communication.” The Bell System Technical Journal 27: 379–423, 623–656.

Short, Thomas L. 2007. Peirce’s Theory of Signs. Cambridge: Cambridge University Press.

Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. 2014. “Sequence to Sequence Learning with Neural Networks.” In Advances in Neural Information Processing Systems 27, edited by Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, 3104–112. Neural Information Processing Systems Foundation (NeurIPS). Accessed May 16, 2024.

Søgaard, Anders. 2023. “Grounding the Vector Space of an Octopus: Word Meaning from Raw Text.” Minds and Machines 33 (1): 33–54.

Taivalkoski-Shilov, Kristiina, and Bruno Poncharal. 2020. Translating the Voices of Nature/Traduire Les Voix de La Nature. Montreal: Éditions québécoises de l’oeuvre.

Tuominen, Tiina, Catalina Jiménez Hurtado, and Anne Ketola. 2018. “Why Methods Matter: Approaching Multimodality in Translation Research.” Linguistica Antverpiensia, New Series – Themes in Translation Studies 17: 1–21.

Vardasbi, Ali, Telmo Pessoa Pires, Robin M. Schmidt, and Stephan Peitz. 2023. “State Spaces Aren’t Enough: Machine Translation Needs Attention. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, edited by Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, et al., 205–16. Tampere: European Association for Machine Translation. Accessed May 16, 2024.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems 30, edited by Ulrike von Luxburg, Samy Bengio, Rob Fergus, Roman Garnett, Isabelle Guyon, Hanna Wallach, and S.V.N. Vishwanathan, 5999–6009. Neural Information Processing Systems Foundation, Inc. (NeurIPS). Accessed May 16, 2024.

Venuti, Lawrence. 2019. Contra Instrumentalism: A Translation Polemic. Boston: University of Nebraska Press.

Vihelmaa, Ella. 2018. “Kielen kääntöpuolella. Kuinka tutkia toislajisten merkkien kääntymistä ihmiskielelle?” [On the animal side of language. How to study the translation of nonhuman signs into human language?]. Licentiate thesis. Joensuu: University of Eastern Finland. Accessed May 16, 2024.

Virtanen, Tuomas, Mark D. Plumbley, and Dan Ellis, eds. 2018. Computational analysis of sound scenes and events. Cham: Springer International Publishing.

Wu, Ho-Hsiang, Oriol Nieto, Juan Pablo Bello, and Justin Salamon. 2023. “Audio-Text Models Do Not Yet Leverage Natural Language.” In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. The Institute of Electrical and Electronics Engineers.

Zárate, Soledad. 2021. Captioning and Subtitling for d/Deaf and Hard of Hearing Audiences. London: UCL Press.

Zheng, Binghan, Sergey Tyulenev, and Kobus Marais. 2023. “Introduction: (re-)conceptualizing translation in translation studies.” Translation Studies 16 (2): 167–177.

Zhou, Zelin, Zhiling Zhang, Xuenan Xu, Zeyu Xie, Mengyue Wu, and Kenny Q. Zhu. 2022. “Can Audio Captions Be Evaluated With Image Caption Metrics?.” In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings, 981–85. The Institute of Electrical and Electronics Engineers.



28. 06. 2024




How to Cite

Haapaniemi, R., Mesaros, A., Harju, M., Martín Morató, I., & Hirvonen, M. (2024). Contrasting a semiotic conceptualization of translation with AI text production: The case of audio captioning. STRIDON: Journal of Studies in Translation and Interpreting, 4(1), 25-51.