Datasets
Standard Dataset
AncientLanguageTranslation.heb-eng
- Citation Author(s):
- Submitted by:
- Haseeb Javed
- Last updated:
- Mon, 11/04/2024 - 14:34
- DOI:
- 10.21227/595s-tq23
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
An AI-based Ancient Hebrew Language Translator aims to revive Ancient Hebrew by constructing a comprehensive dataset with contemporary and ancient Hebrew samples. Seamless integration of the Google Vision API facilitates Optical Character Recognition (OCR) for image processing. The translation process initiates in English through the model, leading to a multilingual interface. This initiative represents a crucial step in preserving ancient languages in the digital age. The project incorporates Biblical and Paleo Hebrew dictionaries for linguistic accuracy and utilizes LSTM and Transformer (NLP) model architectures. The chosen Transformer model achieves a Bleu score of 14.8 on a complex dataset, securing the second position in translation analysis with different model NLP test sets. The translation initiative represents a crucial step in preserving ancient languages in the digital age.
this is text form dataset and utilized as in my algorithm this is link of algorithm: https://github.com/hsb601/AncientLanguageTranslator_Model
This data set works with python algorithm of pretrained model transformer and using various packages to use it as mentioned in code.
Dataset Files
- DataSets.zip (2.82 MB)
- He_En_language_translation.ipynb (177.32 kB)
Documentation
Attachment | Size |
---|---|
heb-en.txt | 9.2 MB |