Human Neck movements data acquired using Meatwear - CPRO device - Accelerometer-based Kinematic data. Data fed to OpenSim simulation software extracted Kinematics and Kinetics (Muscles, joints - Forces, Acceleration, Position)


The EEG pain dataset was collected from 12 subjects using Cold pressor test (CPT), and EEG signals were recorded using the Emotiv EPOC Flex Cap (32 channels) with 32 electrodes placed over the scalp at predetermined locations based on the International Electroencephalographic Society (10-20-20), with a frequency of 128 Hz.





In order to load the data, we provide below an example routine working within PyTorch frameworks. We provide two different resolutions, 800 and 7000 um/px.

Within each resolution, we provide .csv files, containing all metadata information for all the included files, comprising:

  • image_id;
  • label (6 classes - HP, NORM, TA.HG, TA.LG, TVA.HG, TVA.LG);
  • type (4 classes - HP, NORM, HG, LG);
  • reference WSI;
  • reference region of interest in WSI (roi);
  • resolution (micron per pixels, mpp);
  • coordinates for the patch (x, y, w, h).

Below you can find the dataloader class of UNITOPatho for PyTorch. More examples can be found here.

import torch

import torchvision

import numpy as np

import cv2

import os


class UNITOPatho(

def __init__(self, df, T, path, target, subsample=-1, gray=False, mock=False):

self.path = path

self.df = df

self.T = T = target

self.subsample = subsample

self.mock = mock

self.gray = gray

allowed_target = ['type', 'grade', 'top_label']

if target not in allowed_target:

print(f'Target must be in {allowed_target}, got {target}')


print(f'Loaded {len(self.df)} images')

def __len__(self):

return len(self.df)

def __getitem__(self, index):

entry = self.df.iloc[index]

image_id = entry.image_id

image_id = os.path.join(self.path, entry.top_label_name, image_id)

img = None

if self.mock:

C = 1 if self.gray else 3

img = np.random.randint(0, 255, (224, 224, C)).astype(np.uint8)


img = cv2.imread(image_id)

if self.subsample != -1:

w = img.shape[0]

while w//2 > self.subsample:

img = cv2.resize(img, (w//2, w//2))

w = w//2

img = cv2.resize(img, (self.subsample, self.subsample))

if self.gray:

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

img = np.expand_dims(img, axis=2)


img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

if self.T is not None:

img = self.T(img)

return img, entry[]


The dataset is part of the MIMIC database and specifically utilise the data corresponding to two patients with ids 221 and 230.


The purpose of this data collection was for the validation of a cuffless blood pressure estimation model during activities of daily living. Data were collected on five young healthy individuals (four males, age 28 ± 6.6 yrs) of varied fitness levels, ranging from sedentary to regularly active, and free of cardiovascular and peripheral vascular disease. Arterial blood pressure was continuously measured using finger PPG (Portapres; Finapres Medical Systems, the Netherlands).


The PD-BioStampRC21 dataset provides data from a wearable sensor accelerometry studyconducted for studying activity, gait, tremor, and other motor symptoms in individuals with Parkinson's disease (PD).In addition to individuals with PD, the dataset also includes data for controls that also went through the same study protocol as the PD participants. Data were acquired using lightweight MC 10 BioStamp RC sensors (MC 10 Inc, Lexington, MA), five of which were attached to each participant for gathering data over a roughly two day interval.


Users of the dataset should cite the following paper:

Jamie L. Adams, Karthik Dinesh, Christopher W. Snyder, Mulin Xiong, Christopher G. Tarolli, Saloni Sharma, E. Ray Dorsey, Gaurav Sharma, "A real-world study of wearable sensors in Parkinson’s disease". Submitted.

where an overview of the study protocol is also provided. Additional detail specific to the dataset and file naming conventions is provided here.

The dataset is comprised of two main components: (I) Sensor and UPDRS-assessment-task annotation data for each participant and (II) demographic and clinical assessment data for all participants. Each of these is described in turn below:

I) Sensor and UPDRS-assessment-task annotation data:

For each participant the sensor accelerometry  and UPDRS-assessment-task annotation data are provided as a zip file, for instance, for participant ID 018. Unzipping the file generates a folder with a name matching the participant ID, for example, 018, that contains the data organized as the following files. Times and timestamps are consistently reported in units of milliseconds starting from the instant of the earliest sensor recording (for the first sensor applied to the participant).

a) Accelerometer sensor data files (CSV) corresponding to the five different sensor placement locations, which are abbreviated as

   1) Trunk (chest)                  - abbreviated as "ch"

   2) Left anterior thigh           - abbreviated as "ll"

   3) Right anterior thigh        - abbreviated as "rl"

   4) Left anterior forearm      - abbreviated as "lh"

   5) Right anterior forearm    - abbreviated as "rh"

   Example file name for accelerometer sensor data files:


   E.g. ch_ID018Accel.csv, ll_ID018Accel.csv, rl_ID018Accel.csv, lh_ID018Accel.csv, and rh_ID018Accel.csv

   File format for the accelerometer sensor data files: Comprises of four columns that provide a timestamp for    each measurement and corresponding triaxial accelerometry relative to the sensor coordinate system.

   Column 1: "Timestamp (ms)"             - Time in milliseconds

   Column 2: "Accel X (g)"                      - Acceleration in X-direction (in units of g = 9.8 m/s^2)

   Column 3: "Accel Y (g)"                      - Acceleration in Y-direction (in units of g = 9.8 m/s^2)

   Column 4: "Accel Z (g)"                      - Acceleration in Z-direction (in units of g = 9.8 m/s^2)

b) Annotation file (CSV). This file provides tagging annotations for the sensor data that identify, via start and end timestamps,     the durations of various clinical assessments performed in the study.   

   Example file name for annotation file:


   E.g. AnnotID018.csv 

    File format for the annotation file: Comprises of four columns

   Column 1: "Event Type"                      - List of in-clinic MDS-UPDRS assessments. Each assessment comprises of                                                                 two queries -  medication status and MDS-UPDRS assessment body locations

   Column 2: "Start Timestamp (ms)"     - Start timestamp for the MDS-UPDRS assessments

   Column 3: "Stop Timestamp (ms)"      - Stop timestamp for the MDS-UPDRS assessments

   Column 4: "Value"                               - Responses to the queries in Column 1 - medication status (OFF/ON) and                                                                  MDS-UPDRS assessment body locations (E.g. RIGHT HAND, NECK, etc.)

   II) Demographic and clinical assessment data

For all participants, the demographic and clinical assessment data are provided as a zip file "". Unzipping the file generates a CSV file named Clinic_DataPDBioStampRCStudy.csv.

File format for the demographic and clinical assessment data file: Comprises of 19 columns

Column 1: "ID"                                                                              - Participant ID

Column 2: "Sex"                                                                            - Participant sex (Male/Female)

Column 3: "Status"                                                                        - Participant disease status (PD/Control)

Column 4: "Age"                                                                            - Participant age

Column 5: "updrs_3_17a"                                                              - Rest tremor amplitude (RUE - Right Upper Extremity)

Column 6: "updrs_3_17b"                                                              - Rest tremor amplitude (LUE - Left Upper Extremity)

Column 7: "updrs_3_17c"                                                              - Rest tremor amplitude (RLE - Right Lower Extremity)

Column 8: "updrs_3_17d"                                                              - Rest tremor amplitude (LLE - Right Lower Extremity)

Column 9: "updrs_3_17e"                                                              - Rest tremor amplitude (Lip/Jaw)

Column 10 - Column 14: "updrs_3_17a_off" - "updrs_3_17e_off"  - Rest tremor amplitude during OFF medication assessment                                                                                                         (ordering similar as that from Column 5 to Column 9)

Column 15 - Column 19: "updrs_3_17a_on" - "updrs_3_17e_on"   - Rest tremor amplitude during ON medication assessment

For details about different MDS-UPDRS assessments and scoring schemes, the reader is referred to:

Goetz, C. G. et al. Movement Disorder Society-sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS): scale presentation and clinimetric testing results. Mov Disord 23, 2129-2170, doi:10.1002/mds.22340 (2008)   


This folder contains the onboard sensor measurements for the EksoGT robotic exoskeleton during experiments with three able-bodied individuals and three non-able-bodied individuals with a spinal cord injury. The data is divided into .mat files by the trial. All able-bodied subjects completed three repetitions of each commanded intent change (Speed Up, Slow Down, and No Change) for each trial set.


Please see ReadMe.pdf for details about the dataset.


Amidst the COVID-19 pandemic, cyberbullying has become an even more serious threat. Our work aims to investigate the viability of an automatic multiclass cyberbullying detection model that is able to classify whether a cyberbully is targeting a victim’s age, ethnicity, gender, religion, or other quality. Previous literature has not yet explored making fine-grained cyberbullying classifications of such magnitude, and existing cyberbullying datasets suffer from quite severe class imbalances.


Please cite the following paper when using this open access dataset:

J. Wang, K. Fu, C.T. Lu, “SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection,” Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), pp. 1699-1708, December 10-13, 2020.

This is a "Dynamic Query Expansion"-balanced dataset containing .txt files with 8000 tweets for each of a fine-grained class of cyberbullying: age, ethnicity, gender, religion, other, and not cyberbullying.

Total Size: 6.33 MB


Includes some data from:

S. Agrawal and A. Awekar, “Deep learning for detecting cyberbullying across multiple social media platforms,” in European Conference on Information Retrieval. Springer, 2018, pp. 141–153.

U. Bretschneider, T. Wohner, and R. Peters, “Detecting online harassment in social networks,” in ICIS, 2014.

D. Chatzakou, I. Leontiadis, J. Blackburn, E. D. Cristofaro, G. Stringhini, A. Vakali, and N. Kourtellis, “Detecting cyberbullying and cyberaggression in social media,” ACM Transactions on the Web (TWEB), vol. 13, no. 3, pp. 1–51, 2019.

T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” arXiv preprint arXiv:1703.04009, 2017.

Z. Waseem and D. Hovy, “Hateful symbols or hateful people? predictive features for hate speech detection on twitter,” in Proceedings of the NAACL student research workshop, 2016, pp. 88–93.

Z. Waseem, “Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter,” in Proceedings of the first workshop on NLP and computational social science, 2016, pp. 138–142.

J.-M. Xu, K.-S. Jun, X. Zhu, and A. Bellmore, “Learning from bullying traces in social media,” in Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2012, pp. 656–666. 


It contains the four biomarkers which we have selected for the instrument, in the first column we have the recordings for heart, in second we have recordings for temperature, third is for muscle activity and last column is for oxygen levels.


This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes. The five datasets used for its curation are:


This dataset can be used for building a predictive machine learning model for early-stage heart disease detection