Dataset for Inclusive Fintech Software Development

Citation Author(s):
Belinda
Marion Kobusingye
Makerere University
Nagwovuma
Margaret
Makerere University
Nansamba
Barbara
Makerere University
Ggaliwango
Marvin
Makerere University
Submitted by:
Belinda Kobusingye
Last updated:
Mon, 09/23/2024 - 00:35
DOI:
10.21227/jp32-an80
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This study presents a English-Luganda parallel corpus comprising over 2,000 sentence pairs, focused on financial decision-making and products. The dataset draws from diverse sources, including social media platforms (TikTok comments and Twitter posts from authoritative accounts like Bank of Uganda and Capital Markets Uganda), as well as fintech blogs (Chipper Cash and Xeno). The corpus covers a range of financial topics, including bonds, loans, and unit trust funds, providing a comprehensive resource for financial language processing in both English and Luganda.

Instructions: 

 

  • Load the dataset using pandas.
  • Inspect the data to understand its structure and identify potential issues.
  • Handle missing values by filling the 'source' column with 'Unknown' and dropping rows with missing values in 'english' or 'luganda' columns.
  • Normalize text in both 'english' and 'luganda' columns by converting to lowercase, removing extra whitespace, and removing special characters.
  • Adjust these steps as needed based on your specific dataset characteristics and project requirements.
Funding Agency: 
Makerere University Research and Innovations Fund

Dataset Files

    Files have not been uploaded for this dataset