Datasets
Standard Dataset
Māori Business Datasets 2020-2024

- Citation Author(s):
- Submitted by:
- Shae Parsons
- Last updated:
- Thu, 03/20/2025 - 03:18
- DOI:
- 10.21227/weph-zm68
- Data Format:
- License:
Abstract
Māori enterprises are pivotal to the economic and cultural prosperity of Aotearoa, yet predictive analysis of business outcomes tailored to these enterprises remains underexplored. This research examines the application of recurrent neural networks (RNNs) and transformer architectures to forecast key performance indicators (KPIs) for Māori small and medium-sized enterprises (SMEs). Anchored in the principles of indigenous data sovereignty—specifically through the Mana Rauranga framework—this study develops predictive models that uphold and protect the integrity of Māori data.The methodology utilises publicly available datasets, including business, employment, and earnings statistics from Stats NZ, while adhering to ethical protocols aligned with Māori aspirations for self-determination in data management. Comparative experiments assess the performance of RNN and transformer models in predicting financial sustainability and business growth. Preliminary findings indicate that transformers outperform RNNs in long-term sequence prediction tasks, offering scalable and culturally attuned AI solutions to support Māori entrepreneurship. This study bridges both technical and indigenous knowledge domains, offering a blueprint for how AI can enhance Māori business success through predictive insights while safeguarding data sovereignty. Future research will focus on community collaboration to further refine and contextualise these models, ensuring their relevance and adaptability.
Overview
This repository contains Python scripts, datasets, and models for my dissertation project focused on Māori Entrepreneurial Outcomes. The research involves building, testing, and refining various machine learning models, including Recurrent Neural Networks (RNNs) and Transformers, for analysing and predicting data outcomes related to entrepreneurship.
Repository Structure
The project is organised into the following folders and key files:
1. RNNTests/
Contains test scripts for running various experiments with Recurrent Neural Networks.
261024RNNTest.py
281224RNNTest2.py
281224RNNTest3.py
281224RNNTest4.py
2. RNN Models/
Contains pre-built and fine-tuned RNN models used for predictions and analysis.
3. Data Preprocessing/
Contains all data cleaning scripts and pre-processed datasets.
-
Scripts:
cleandataset.py
: General dataset cleaning functions.maoridataset.py
: Cleaning and formatting the Māori business dataset.sortingdata1.py
: Custom data sorting functions for preprocessing.
-
Datasets:
Employment_Earnings_from_wages_and_salaries_and_selfemployment.csv
tatauranga-umanga-maori-statistics-on-maori-businesses-2022-english.csv
tpk-tematapaeroa2021-datatables.xlsx
4. Transformer Models/
Contains scripts and results related to Transformer-based models.
-
Scripts:
GPTQA.py
: Script for testing GPT-based question-answer models.GPTtest.py
: Experimental script for Transformer-based tests.TransformerFineTune.py
: Script used for fine-tuning Transformer models.
-
Results:
training_results.txt
: Output and logs from Transformer model training.
Key Scripts
- TransformerFineTune.py
A script used for fine-tuning Transformer models on the dataset.
- GPTQA.py
& GPTtest.py
Scripts designed for testing GPT-based question-answer models on research-specific queries.
- employmentearnings.py
Analyses earnings data by demographic and business types.
How to Run the Scripts
- Clone the repository:
git clone https://github.com/shaxski/Maori-Entreprenurial-Outcomes.git
- Navigate to the appropriate directory:
cd Maori-Entreprenurial-Outcomes
- Run the desired script:
python RNNTests/261024RNNTest.py
Ensure you have all required dependencies installed by running:
pip install -r requirements.txt
Dependencies
The project requires Python 3.8+ and the following libraries:
- TensorFlow / PyTorch (for RNNs and Transformers)
- Pandas (for data manipulation)
- NumPy (for numerical operations)
- Matplotlib (for data visualisation)
Future Directions
The next steps involve:
- Expanding the data preprocessing pipeline.
- Further refining the RNN and Transformer models.
- Building a front-end application for interactive data analysis.
Acknowledgements
This project is conducted as part of my Master's dissertation in collaboration with AUT, supervised under Dr. Mahsa Mohaghegh. The research integrates Te Ao Māori perspectives, focusing on indigenous data sovereignty and ethical AI practices.
For any questions, please contact: shae.parsons@hotmail.com.