Datasets
Open Access
SUPara-Benchmark: A Benchmark Dataset for English-Bangla Machine Translation
- Citation Author(s):
- M. A. A. Mumin, M. H. Seddiqui, M. Z. Iqbal, M. J. Islam
- Submitted by:
- Mohammad Mumin
- Last updated:
- Thu, 08/02/2018 - 10:06
- DOI:
- 10.21227/czes-gs42
- Data Format:
- License:
- Creative Commons Attribution
2143 Views
- Categories:
- Keywords:
0 ratings - Please login to submit your rating.
Abstract
Since there is no standard validation or development set and evaluation or test set for English-Bangla machine translation task, this dataset presents well-chosen, balanced length, and general-purpose data for validation and evaluation set.
Instructions:
suparadev2018 is a validation or development dataset.
suparatest2018 is a evaluation or test dataset.
Dataset Files
- SUPara Benchmark Dataset SUPara-benchmark.zip (107.00 kB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Comments
Dear Sir,
I am Raian Rahman. I am currently enrolled at the Department of Computer Science and Engineering in the Islamic University of Technology, Gazipur, Bangladesh. Currently, I am doing research on Natural Language Processing specializing in Machine Translation. I was exploring the available datasets for the English to Bangla dataset and I found this dataset. I would like to humbly request you to give me access to your dataset. I promise to use this dataset only for my research purpose.
I am Asab Azad, a student of North South University. I am interested to use the "SUPara0.8M: A Balanced English-Bangla Parallel Corpus" for research purposes. I was wondering if you could let me know how I can access/download the corpus.