Abstract

This study presents a English-Luganda parallel corpus comprising over 2,000 sentence pairs, focused on financial decision-making and products. The dataset draws from diverse sources, including social media platforms (TikTok comments and Twitter posts from authoritative accounts like Bank of Uganda and Capital Markets Uganda), as well as fintech blogs (Chipper Cash and Xeno). The corpus covers a range of financial topics, including bonds, loans, and unit trust funds, providing a comprehensive resource for financial language processing in both English and Luganda.

Instructions:

Load the dataset using pandas.
Inspect the data to understand its structure and identify potential issues.
Handle missing values by filling the 'source' column with 'Unknown' and dropping rows with missing values in 'english' or 'luganda' columns.
Normalize text in both 'english' and 'luganda' columns by converting to lowercase, removing extra whitespace, and removing special characters.
Adjust these steps as needed based on your specific dataset characteristics and project requirements.

Funding Agency:

Makerere University Research and Innovations Fund

Comments

I am looking dataset for my project I request you to please kindly provide me dataset

Submitted by Shrinivas Madali on Sat, 10/19/2024 - 20:13

How do I receive my funding

Submitted by Lejonta Jones on Fri, 10/25/2024 - 12:47

Dataset Files

eng-lug-fintech-dataset.zip (213.11 kB)

Datasets

Standard Dataset

Dataset for Inclusive Fintech Software Development

Abstract

Comments

Dataset Files

QUESTIONS?