SUPara-Benchmark: A Benchmark Dataset for English-Bangla Machine Translation

Citation Author(s):
M. A. A. Mumin, M. H. Seddiqui, M. Z. Iqbal, M. J. Islam
Submitted by:
Mohammad Mumin
Last updated:
Thu, 08/02/2018 - 10:06
Data Format:
Creative Commons Attribution
0 ratings - Please login to submit your rating.


Since there is no standard validation or development set and evaluation or test set for English-Bangla machine translation task, this dataset presents well-chosen, balanced length, and general-purpose data for validation and evaluation set.


suparadev2018 is a validation or development dataset.

suparatest2018 is a evaluation or test dataset.



Dear Sir,
I am Raian Rahman. I am currently enrolled at the Department of Computer Science and Engineering in the Islamic University of Technology, Gazipur, Bangladesh. Currently, I am doing research on Natural Language Processing specializing in Machine Translation. I was exploring the available datasets for the English to Bangla dataset and I found this dataset. I would like to humbly request you to give me access to your dataset. I promise to use this dataset only for my research purpose.

Submitted by Raian Rahman on Mon, 04/05/2021 - 05:06

I am Asab Azad, a student of North South University. I am interested to use the "SUPara0.8M: A Balanced English-Bangla Parallel Corpus" for research purposes. I was wondering if you could let me know how I can access/download the corpus.

Submitted by Asab Azad on Sun, 05/09/2021 - 16:23

Dataset Files

Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.