Document-Level Machine Translation as a Re-translation Process

  1. Eva Martínez García
  2. Cristina España-Bonet
  3. Lluís Màrquez Villodre
Revue:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Année de publication: 2014

Número: 53

Pages: 103-110

Type: Article

D'autres publications dans: Procesamiento del lenguaje natural

Résumé

Most of the current Machine Translation systems are designed to translate a document sentence by sentence ignoring discourse information and producing incoherencies in the final translations. In this paper we present some document-level-oriented post-processes to improve translations' coherence and consistency. Incoherences are detected and new partial translations are proposed. The work focuses on studying two phenomena: words with inconsistent translations throughout a text and also, gender and number agreement among words. Since we deal with specific phenomena, an automatic evaluation does not reflect significant variations in the translations. However, improvements are observed through a manual evaluation.

Références bibliographiques

  • Gong, Z., M. Zhang, and G. Zhou. 2011. Cache-based document-level statistical machine translation. In Proc. of the 2011 Conference on Empirical Methods in NLP, pages 909-919, UK.
  • González, M., J. Giménez, and L. Màrquez. 2012. A graphical interface for MT evaluation and error analysis. In Proc. of the 50th ACL Conference, System Demonstrations, pages 139-144, Korea.
  • Hardmeier, C. and M. Federico. 2010. Modelling pronominal anaphora in statistical machine translation. In Proc. of the 7th International Workshop on Spoken Language Translation, pages 283-289, France.
  • Hardmeier, C., J. Nivre, and J. Tiedemann. 2012. Document-wide decoding for phrase-based statistical machine translation. In Proc. of the Joint Conference on Empirical Methods in NLP and Computational Natural Language Learning, pages 1179-1190, Korea.
  • Hardmeier, C., S. Stymne, J. Tiedemann, and J. Nivre. 2013. Docent: A documentlevel decoder for phrase-based statistical machine translation. In Proc. of the 51st ACL Conference, pages 193{198, Bulgaria.
  • Koehn, P. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In Conference Proc.: the tenth Machine Translation Summit, pages 79-86. AAMT.
  • Koehn, P., H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proc. of the 45th ACL Conference, pages 177-180, Czech Republic.
  • Nagard, R. Le and P. Koehn. 2010. Aiding pronouns translation with co-reference resolution. In Proc. of Joint 5th Workshop on Statistical Machine Translation and MetricsMATR, pages 252-261, Sweden.
  • Och, F. 2003. Minimum error rate training in statistical machine translation. In Proc. of the ACL Conference.
  • Och, F. and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics.
  • Padró, L., S. Reese, E. Agirre, and A. Soroa. 2010. Semantic services in freeling 2.1: Wordnet and ukb. In Principles, Construction, and Application of Multilingual Wordnets, pages 99{105, India. Global Wordnet Conference.
  • Papineni, K., S. Roukos, T. Ward, and W.J. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proc. of the 40th ACL Conference, pages 311-318.
  • Sapena, E., L. Padró, and J. Turmo. 2010. A global relaxation labeling approach to coreference resolution. In Proceedings of 23rd COLING, China.
  • Stolcke, A. 2002. SRILM - An extensible language modeling toolkit. In Proc. Intl. Conf. on Spoken Language Processing.
  • Surdeanu, M., J. Turmo, and E. Comelles. 2005. Named entity recognition from spontaneous open-domain speech. In Proc. of the 9th Interspeech.
  • Xiao, T., J. Zhu, S. Yao, and H. Zhang. 2011. Document-level consistency verification in machine translation. In Proc. of Machine Translation Summit XIII, pages 131-138, China.