Skip to main content

Datasets

Standard Dataset

Dataset for Inclusive Fintech Software Development

Citation Author(s):
Belinda Marion Kobusingye (Makerere University)
Nagwovuma Margaret (Makerere University)
Nansamba Barbara (Makerere University)
Ggaliwango Marvin (Makerere University)
Submitted by:
Belinda Kobusingye
Last updated:
DOI:
10.21227/jp32-an80
Data Format:
No Ratings Yet

Abstract

This study presents a English-Luganda parallel corpus comprising over 2,000 sentence pairs, focused on financial decision-making and products. The dataset draws from diverse sources, including social media platforms (TikTok comments and Twitter posts from authoritative accounts like Bank of Uganda and Capital Markets Uganda), as well as fintech blogs (Chipper Cash and Xeno). The corpus covers a range of financial topics, including bonds, loans, and unit trust funds, providing a comprehensive resource for financial language processing in both English and Luganda.

Instructions:

 

  • Load the dataset using pandas.
  • Inspect the data to understand its structure and identify potential issues.
  • Handle missing values by filling the 'source' column with 'Unknown' and dropping rows with missing values in 'english' or 'luganda' columns.
  • Normalize text in both 'english' and 'luganda' columns by converting to lowercase, removing extra whitespace, and removing special characters.
  • Adjust these steps as needed based on your specific dataset characteristics and project requirements.
Funding Agency
Makerere University Research and Innovations Fund

I am looking dataset for my project I request you to please kindly provide me dataset

Shrinivas Madali Sun, 10/20/2024 - 00:13 Permalink