Abstract 

Maternal, sexual and reproductive healthcare (MSRH) are sensitive urgent public health issues that require timely trustworthy authentic medical responses. Unfortunately, curative healthcare systems of Low Middle-Income Countries (LMICs) are insufficiently responsive to such healthcare needs. Such needs vary among social groups often founded on social inequalities like income, gender and education. Therefore, health information seekers turn to unregulated online healthcare platforms, social media and Large Language Modes (LLMs) which unregulatedly provide unverified healthcare information. 

This work systematically examined the philosophical foundations of responsible data and Artificial Intelligence (AI) practices governing data and AI modelling for intelligent systems based on peer-reviewed articles, book chapters, technical reports, and studies published between 1973 and 2022. These studies were restricted to the philosophy of AI and Society 5.0 to inform the derivation of over 29 forms of AI philosophies with their fundamental relationships with Society 5.0. This unveiled intrinsic manifestations of algorithmic unfairness arising from inequitable AI and Machine Learning (ML) training datasets besides the irresponsible data and AI modelling practices. 

We further traced this algorithmic unfairness to the unguided and unregulated AI industry practices propagated by the selection of inappropriate research paradigms to inform the creation of specific AI and ML training datasets for building intelligent healthcare systems. Such systems included online platforms and chatbots designed to provide authentic timely responses that inform healthcare decision-making among vulnerable online information seekers like teenagers and young women across various social groups. This was further traced to the need for responsible and Inclusive Intersectional AI practices and research approaches to creating ML and AI training datasets for equitable intelligent healthcare systems.  Therefore, we intersectionally crowdsourced maternal healthcare advice from over 500 verified practising healthcare professionals from Lira University Teaching Hospital, Brac University and Brac Uganda’s health program Versus their online social acquaintances within their social networks to create a dataset based on responsible data practices. We also scrapped, curated and annotated MSRH data from and about African contexts. This data can be used to not only finetune existing health intelligent systems but also develop responsible software systems that are contextually relevant to LMICs in Africa.  

We have implemented trustworthy medical sentiment analysis with local interpretable model agnostic explanations as responsible AI principles to distinguish between authentic and non-authentic maternal healthcare advice. Surprisingly, we obtained a train set accuracy of 93% and a validation set accuracy of 56%, a generalization log loss of 0.259, a generalization balance accuracy of 83% and a generalization Area Under the Curve of 90% meaning our models performed perfectly well at evaluating context and sentiments but failed to accurately distinguish between authentic and non-authentic medical advice. This reveals computational uncertainty among AI-driven models in healthcare. It also means that AI models cannot certainly distinguish between authentic and non-authentic medical advice hence a need for better conversational AI techniques and online healthcare tools to conversationally disseminate authentic medical advice. As we make our responsible medical corpus openly available for researchers to work with, we embarked on creating conversational AI techniques for leveraging conversational AI tools like ChatGPT by the information seekers through prompt engineering and Retrieval-Augmented Generation. The prompt engineering techniques have been published and made openly available for the general public to responsibly guide health information seeks however there is an urgent need for policy, guidelines and regulation of online healthcare practice. 

Keywords:Artificial Intelligence (AI), Conversational AI, Responsible AI, Large Language Models (LLMs), Maternal Health, Health Equity.

Instructions: 

Guide and Instructions for Using the Responsible Medical Corpus (RMC) for Maternal, Sexual, and Reproductive Health

Introduction

The Responsible Medical Corpus (RMC) dataset is an African-contextualized NLP dataset designed to improve the quality of conversations in systems related to maternal, sexual, and reproductive health (MSRH). This comprehensive guide provides detailed instructions for using the RMC dataset to perform prompt engineering, fine-tune Large Language Models (LLMs), implement Retrieval-Augmented Generation (RAG), and develop responsible software systems for MSRH. The aim is to ensure users can effectively leverage this dataset to build systems that are ethical, effective, and tailored to the African context.

1. Retrieving the Dataset

Objective: Access and prepare the RMC dataset for use.

  1. Download the Dataset:

    • Obtain the RMC dataset from IEEE Dataport.
    • Verify the integrity of the downloaded dataset.
  2. Explore the Dataset:

    • Examine the dataset structure, including text samples, annotations, and metadata.
    • Understand the scope and content of the dataset, focusing on its relevance to MSRH.

2. Data Cleaning and Preprocessing

Objective: Prepare the data for modeling by cleaning and preprocessing.

  1. Data Cleaning:

    • Remove any irrelevant or duplicate entries.
    • Correct any spelling or grammatical errors in the text.
  2. Normalization:

    • Normalize text by converting it to lowercase.
    • Remove special characters, punctuation, and stop words that do not add value to the analysis.
  3. Tokenization:

    • Tokenize the text into words or subwords, depending on the requirements of the model.
    • Use libraries like NLTK, SpaCy, or Hugging Face's tokenizers for efficient tokenization.
  4. Lemmatization and Stemming:

    • Apply lemmatization or stemming to reduce words to their base or root forms.
  5. Text Augmentation:

    • Use data augmentation techniques to increase dataset variability and robustness.
    • Techniques may include synonym replacement, random insertion, or back-translation.

3. Prompt Engineering

Objective: Design effective prompts to leverage the RMC dataset for various NLP tasks.

  1. Understanding Prompt Engineering:

    • Learn the basics of prompt engineering and its importance in NLP.
    • Understand how to design prompts that elicit the desired response from language models.
  2. Creating Prompts:

    • Design prompts that are contextually relevant to MSRH.
    • Ensure prompts are clear, concise, and aligned with the intended task (e.g., diagnosis, conversation, information retrieval).
  3. Testing and Refining Prompts:

    • Test prompts with sample data and refine them based on the model's responses.
    • Iterate on the design to improve the quality and relevance of the outputs.

4. Fine-Tuning Large Language Models (LLMs)

Objective: Fine-tune pre-trained LLMs using the RMC dataset for specific tasks.

  1. Selecting Pre-trained Models:

    • Choose suitable pre-trained models from Hugging Face Model Hub (e.g., BERT, GPT-3, T5).
    • Ensure the selected model aligns with the task requirements and dataset characteristics.
  2. Preparing Data for Fine-Tuning:

    • Format the dataset to match the input requirements of the chosen model.
    • Create training and validation splits to evaluate model performance.
  3. Fine-Tuning Process:

    • Use frameworks like Hugging Face's Transformers to fine-tune the model.
    • Set appropriate hyperparameters (e.g., learning rate, batch size, number of epochs).
    • Monitor training progress and adjust parameters as necessary.
  4. Evaluation:

    • Evaluate the fine-tuned model on validation data using relevant metrics (e.g., accuracy, F1-score).
    • Conduct qualitative analysis by reviewing the model's outputs for specific prompts.

5. Retrieval-Augmented Generation (RAG)

Objective: Implement RAG to enhance the quality and accuracy of generated responses.

  1. Understanding RAG:

    • Learn about RAG and its benefits in combining retrieval mechanisms with generation capabilities.
  2. Setting Up Retrieval Mechanisms:

    • Use tools like ElasticSearch or Faiss to index and retrieve relevant documents or passages from the dataset.
    • Implement retrieval strategies that ensure high recall and precision.
  3. Integrating Retrieval with Generation:

    • Combine retrieved information with generative models to produce informed and accurate responses.
    • Ensure the retrieval component is optimized to fetch the most relevant data.
  4. Evaluation:

    • Assess the performance of the RAG system using both automated metrics and human evaluation.
    • Focus on relevance, coherence, and informativeness of the generated responses.

6. Building and Evaluating Models

Objective: Develop models for various NLP tasks and evaluate their performance.

  1. Model Selection:

    • Choose appropriate model architectures for the intended tasks (e.g., classification, question-answering, dialogue systems).
  2. Model Training:

    • Train models using the prepared dataset and fine-tuned LLMs.
    • Implement techniques like transfer learning to leverage pre-trained knowledge.
  3. Evaluation:

    • Evaluate models on test data using relevant metrics (e.g., accuracy, precision, recall, F1-score).
    • Conduct error analysis to identify areas for improvement.

7. Model Fusion and Ensembling

Objective: Improve model robustness and accuracy through ensembling techniques.

  1. Ensemble Methods:

    • Combine predictions from multiple models using techniques like voting, averaging, or stacking.
    • Experiment with different ensemble strategies to find the best-performing combination.
  2. Evaluation:

    • Evaluate the ensemble model's performance on the test set and compare it with individual models.

8. Machine Learning Operations (MLOps)

Objective: Streamline the development, deployment, and monitoring of models.

  1. Version Control:

    • Use version control systems (e.g., Git) to track changes in code, data, and models.
  2. Continuous Integration and Continuous Deployment (CI/CD):

    • Set up CI/CD pipelines to automate the training, testing, and deployment of models.
    • Use tools like Jenkins, GitLab CI, or GitHub Actions for pipeline automation.
  3. Model Monitoring:

    • Implement monitoring tools to track model performance in real-time.
    • Set up alert systems to detect and respond to performance degradation or biases.

9. Deployment of Models

Objective: Deploy models in various Responsible Software Systems across different fields.

  1. Healthcare:

    • Deploy models in telemedicine applications to assist healthcare professionals in diagnosing patients based on symptom text.
    • Use chatbots to provide information and support for maternal, sexual, and reproductive health.
  2. Education:

    • Integrate models into educational platforms to provide accurate information on MSRH topics.
    • Develop conversational screening protocols for educational purposes.
  3. Public Health:

    • Implement models in public health systems to improve the dissemination of health information.
    • Use RAG systems to support health workers in retrieving up-to-date medical information.
  4. Customer Service:

    • Use conversational AI to provide accurate and empathetic responses in customer service applications related to health.

Conclusion

The Responsible Medical Corpus (RMC) dataset provides a unique opportunity to develop responsible software systems for maternal, sexual, and reproductive health. By following these detailed instructions, users can leverage the RMC dataset to create AI systems that are accurate, ethical, and contextually relevant. This guide aims to foster innovation in MSRH, ensuring that AI systems are developed with a focus on responsibility, inclusivity, and effectiveness.