Skip to main content

Datasets

Open Access

SUPara-Benchmark: A Benchmark Dataset for English-Bangla Machine Translation

Citation Author(s):
Submitted by:
Mohammad Mumin
Last updated:
DOI:
10.21227/czes-gs42
Data Format:
No Ratings Yet

Abstract

Since there is no standard validation or development set and evaluation or test set for English-Bangla machine translation task, this dataset presents well-chosen, balanced length, and general-purpose data for validation and evaluation set.

Instructions:

suparadev2018 is a validation or development dataset.

suparatest2018 is a evaluation or test dataset.

 

Dear Sir, I am Raian Rahman. I am currently enrolled at the Department of Computer Science and Engineering in the Islamic University of Technology, Gazipur, Bangladesh. Currently, I am doing research on Natural Language Processing specializing in Machine Translation. I was exploring the available datasets for the English to Bangla dataset and I found this dataset. I would like to humbly request you to give me access to your dataset. I promise to use this dataset only for my research purpose.
Raian Rahman Mon, 04/05/2021 - 09:06 Permalink
I am Asab Azad, a student of North South University. I am interested to use the "SUPara0.8M: A Balanced English-Bangla Parallel Corpus" for research purposes. I was wondering if you could let me know how I can access/download the corpus.
Asab Azad Sun, 05/09/2021 - 20:23 Permalink

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.