SUPara-Benchmark: A Benchmark Dataset for English-Bangla Machine Translation

Citation Author(s):
Submitted by:: Mohammad Mumin
Last updated:: Thu, 08/02/2018 - 14:06
DOI:: 10.21227/czes-gs42
Data Format:: TXT

2210 views

Categories:

Keywords:

Machine translation

English-Bangla machine translation

CITE

Abstract

Since there is no standard validation or development set and evaluation or test set for English-Bangla machine translation task, this dataset presents well-chosen, balanced length, and general-purpose data for validation and evaluation set.

Instructions:

suparadev2018 is a validation or development dataset.

suparatest2018 is a evaluation or test dataset.

Dear Sir, I am Raian Rahman. I am currently enrolled at the Department of Computer Science and Engineering in the Islamic University of Technology, Gazipur, Bangladesh. Currently, I am doing research on Natural Language Processing specializing in Machine Translation. I was exploring the available datasets for the English to Bangla dataset and I found this dataset. I would like to humbly request you to give me access to your dataset. I promise to use this dataset only for my research purpose.

Raian Rahman Mon, 04/05/2021 - 09:06 Permalink

I am Asab Azad, a student of North South University. I am interested to use the "SUPara0.8M: A Balanced English-Bangla Parallel Corpus" for research purposes. I was wondering if you could let me know how I can access/download the corpus.

Asab Azad Sun, 05/09/2021 - 20:23 Permalink