Comparing Standard and Factored Models in Statistical Machine Translation from English to Slovene Using the Moses System
DOI:
https://doi.org/10.4312/slo2.0.2017.1.1-26Keywords:
statistical machine translation, factored machine translation, Moses system, BLEU, human evaluationAbstract
Machine translation is a field in computational linguistics that explores the use of software to translate text from one language to another. Factored statistical translation is an extension of statistical machine translation, where linguistic annotation is added on the word level. Words are turned into vectors in an attempt to improve translation quality. We describe the use of the open-source Moses system for factored statistical machine translation from English to Slovenian. We created several factored and non-factored language and translation models from a text corpus, containing IT-related texts. We translated two different IT-related documents. The first one was marketing-orientated with a complex structure, while the second one was technical with a simpler structure. We used two methods to compare the generated translations with two independent human translations and a translation, created by the Google Translate service. The first comparison method was the BLEU metrics and the second one were evaluations of human reviewers. The latter method expressed a subjective score, which is still very important in the machine translation field. Even though the results can’t be compared directly due to different metrics, the movement of the grades is well correlated for both texts. The only bigger difference can be seen while implementing factored models for translating the second text. In the conclusion we analysed the inter-evaluator coherence and the obtained results. We discovered that our models are more suitable for technical texts, and that factored models improve the translation of complex texts more.Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 Sašo Kuntarič, Simon Krek, Marko Robnik Šikonja

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All content of Slovenščina 2.0 is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Slovenščina 2.0 applies the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license to all published material. Under this license, authors retain ownership of the copyright for their content, but allow anyone to download, reuse, reprint, modify, distribute, copy, remix, transform and/or build upon the content for any purpose, even commercial, as long as the original authors and source are cited. No permission is required from the authors or the publishers. Appropriate attribution can be provided by simply citing the original article. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. For any reuse or redistribution of a work, users must also make clear the license terms under which the work was published.
No separate publishing agreements are signed between the author and the publisher. Authors retain copyright and the publishing rights of their work without any restrictions.
Authors are permitted and encouraged to post the journal’s published version of the work online (e.g., in institutional repositories, on their own websites), with an acknowledgement of its initial publication in Slovenščina 2.0.