BanglaMusicStylo: A Stylometric Dataset of Bangla Music Lyrics

Citation Author(s):
Ahmed Al
Daffodil International University
Daffodil International University
Submitted by:
Ahmed Al Marouf
Last updated:
Sun, 03/21/2021 - 03:25
Data Format:
0 ratings - Please login to submit your rating.


With the rapid growth of the Bangla music industry huge volume of Bangla songs is produced every day. An immense number of producers, lyricists, singers, and artists are involved in the production of songs from different genres. Among many genres of Bangla music; classical, folk, baul, modern music, Rabindra Sangeet, Nazrul Geeti, film music, rock music, and fusion music have gained the highest popularity. Lyricists try to express their feelings and views towards any situation or subject through their writings. Therefore, each lyricist has their own dictionary of thoughts to put on music lyrics. In this paper, we have presented “BanglaMusicStylo”, the very first stylometric dataset of Bangla music lyrics. We have collected 2824 Bangla song lyrics of 211 lyricists in a digital form. All the lyrics are stored in text format for further use. This dataset could be used for stylometric analysis such as authorship attribution, linguistic forensics, gender identification from textual data, Bangla music genre classification, vandalism detection, emotion classification, etc. Identifying the significant research opportunities in this area, we have formalized this dataset which could be used for stylometric analysis.


The dataset contains the separate folders named after Bangla Song Writers or authors. Each folder contains the word files having the raw format of song lyrics. Download the files and use natural language processing to develop advance methods.