Datasets
Standard Dataset
LUS: Mizo Monolingual Corpus
- Citation Author(s):
- Submitted by:
- Candy Lalrempuii
- Last updated:
- Tue, 04/04/2023 - 03:59
- DOI:
- 10.21227/5601-9c25
- Data Format:
- License:
214 Views
- Categories:
- Keywords:
0 ratings - Please login to submit your rating.
Abstract
Mizo or Lushai language is the official language of Mizoram, a state in the north-eastern part of India. It is an under-resourced language that falls under the Tibeto-Burman language family and is highly tonal in nature.
LUS dataset comprises monolingual corpus crawled from different Mizo news websites such as Zalen (https://zalen.in/) and Times of Mizoram (https://www.timesofmizoram.com/). The dataset consists of a total of 101827 Mizo language sentences for research and academic purposes.