Datasets
Standard Dataset
Multiple Sclerosis Pubmed Abstracts with SDoH relations Nov 2024
![](https://ieee-dataport.org/sites/default/files/styles/3x2/public/tags/images/dna-3598439_1920.jpg?itok=dq6kcJl6)
- Citation Author(s):
- Submitted by:
- Andres Frederic
- Last updated:
- Thu, 02/06/2025 - 23:06
- DOI:
- 10.21227/bgd6-ar29
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This dataset comprises a comprehensive collection of PubMed abstracts and associated metadata focusing on the topic of multiple sclerosis (MS) in relation to social determinants and environmental factors, spanning publications from January 1, 2018, to November 15, 2024. The data was meticulously gathered using the PubMed E-Utilities API with the search query "multiple sclerosis" AND ("social determinants" OR "environmental factors"), where multiple sclerosis was included by its MESH term.
Articles classified as preprints were excluded to ensure the inclusion of peer-reviewed research only.
This curated dataset serves as a valuable resource for researchers, clinicians, and policymakers interested in exploring the interplay between multiple sclerosis and socio-environmental factors. It facilitates literature reviews, trend analyses, and supports the development of interventions aimed at addressing the social and environmental determinants of health in the context of MS.
Keywords: Multiple Sclerosis, Social Determinants, Environmental Factors, PubMed Abstracts, Dataset, Biomedical Research, Literature Review
Each entry in the dataset includes:
- ID: A unique internal identifier for each article.
- Title: The title of the research article.
- Authors: A list of authors associated with the article.
- Year: The publication year.
- Abstract: The full abstract text
- Citations: Number of citations of the article
- Journal: the publishing journal
The dataset was assembled by first retrieving all relevant PubMed IDs (PMIDs) matching the search criteria. Subsequent detailed information for each PMID was fetched in batches to optimize the data retrieval process. The extraction focused on key elements such as publication date, article type, title, authorship, and abstract content.
We are excluding preprints, and only take into accounts articles from the last 5 years (as of Nov 24)
Dataset Files
- over 5000 Pubmed abstracts on multiple sclerosis risk factors, last 5 years, excluding preprint - training data RAW_MS_SDoH_pubmed_abstracts_trainingdata.json (4.37 MB)
- about 500 Pubmed abstracts on multiple sclerosis risk factors, last 5 years, excluding preprint - for testing and validation RAW_MS_SDoH_pubmed_abstracts_testdata.json (869.55 kB)
- Named entity recognition processed dataset, ready for relationship extraction NER_processed_MS_SDoH_pubmed_abstracts.json (3.80 MB)