Toward a Test Set of Dislocations in Persian for Neural Machine Translation - Productions scientifiques du CLILLAC-ARP Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Toward a Test Set of Dislocations in Persian for Neural Machine Translation

Résumé

This paper describes a test set designed to analyse the translation of dislocations from Persian, to be used for testing neural machine translation models. We first tested the accuracy of the two Universal dependency treebanks for Persian to automatically detect dislocations. Then we parsed the available Persian treebanks on GREW (Bonfante et al., 2018) to build a specific test set containing examples of dislocations. With available aligned data on OPUS (Tiedemann, 2016), we trained a model to translate from Persian into English on openNMT (Klein et al., 2017). We report the results of our translation test set by several toolkits (Google Translate, MBART-50 (Tang et al., 2020), Microsoft Bing and our in-house translation model) for the translation into English. We discuss why dislocations in Persian provide an interesting testbed for neural machine translation.
Fichier principal
Vignette du fichier
ACL_Farsi_Persian__NSURNLP.pdf (359.71 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03912609 , version 1 (24-12-2022)

Identifiants

  • HAL Id : hal-03912609 , version 1

Citer

Behnoosh Namdarzadeh, Nicolas Ballier, Guillaume Wisniewski, Lichao Zhu, Jean-Baptiste Yunès. Toward a Test Set of Dislocations in Persian for Neural Machine Translation. The Third International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2022), Dec 2022, Trente, Italy. ⟨hal-03912609⟩
69 Consultations
81 Téléchargements

Partager

Gmail Facebook X LinkedIn More