Amidst the COVID-19 pandemic, cyberbullying has become an even more serious threat. Our work aims to investigate the viability of an automatic multiclass cyberbullying detection model that is able to classify whether a cyberbully is targeting a victim’s age, ethnicity, gender, religion, or other quality. Previous literature has not yet explored making fine-grained cyberbullying classifications of such magnitude, and existing cyberbullying datasets suffer from quite severe class imbalances.


Please cite the following paper when using this open access dataset: J. Wang, K. Fu, C.T. Lu, “SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection,” Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), December 10-13, 2020.

This is a "Dynamic Query Expansion"-balanced dataset containing .txt files with 8000 tweets for each of a fine-grained class of cyberbullying: age, ethnicity, gender, religion, other, and not cyberbullying.

Total Size: 6.33 MB


Includes some data from:

S. Agrawal and A. Awekar, “Deep learning for detecting cyberbullying across multiple social media platforms,” in European Conference on Information Retrieval. Springer, 2018, pp. 141–153.

U. Bretschneider, T. Wohner, and R. Peters, “Detecting online harassment in social networks,” in ICIS, 2014.

D. Chatzakou, I. Leontiadis, J. Blackburn, E. D. Cristofaro, G. Stringhini, A. Vakali, and N. Kourtellis, “Detecting cyberbullying and cyberaggression in social media,” ACM Transactions on the Web (TWEB), vol. 13, no. 3, pp. 1–51, 2019.

T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” arXiv preprint arXiv:1703.04009, 2017.

Z. Waseem and D. Hovy, “Hateful symbols or hateful people? predictive features for hate speech detection on twitter,” in Proceedings of the NAACL student research workshop, 2016, pp. 88–93.

Z. Waseem, “Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter,” in Proceedings of the first workshop on NLP and computational social science, 2016, pp. 138–142.

J.-M. Xu, K.-S. Jun, X. Zhu, and A. Bellmore, “Learning from bullying traces in social media,” in Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2012, pp. 656–666. 


It contains the four biomarkers which we have selected for the instrument, in the first column we have the recordings for heart, in second we have recordings for temperature, third is for muscle activity and last column is for oxygen levels.


This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes. The five datasets used for its curation are:


This dataset can be used for building a predictive machine learning model for early-stage heart disease detection


**Dataset will be uploaded soon - dataset is complete but uploader is currently freezing midway through status bar**

This dataset contains inertial data consisting of 1) physiotherapy exercise recordings, and 2) unlabeled other activity data recordings, each collected by smart watches worn by healthy subjects. 

This dataset may be used to perform supervised classification analysis of physiotherapy exercises, or to perform out-of-distribution detection analysis with the unlabeled other activity data.


This inertial dataset consists of 20 csv files, each one corresponding to one of 20 healthy subjects. Inertial data was captured at 50 Hz.

Each record consists of an Nx10 array, where numbered columns correspond to:

0-2: Accelerometer (X/Y/Z) in G's

3-5: Magnetometer (X/Y/Z) in μT's

6-8: Gyroscope (X/Y/Z) in rad/s

9: Heart Rate in bpm.

10: Shoulder (0 = OOD (unlabeled), 1 = Left Shoulder, 2 = Right Shoulder)

11: Activity Label (0-11 as described above)


Optical Character Recognition (OCR) system is used to convert the document images, either printed or handwritten, into its electronic counterpart. But dealing with handwritten texts is much more challenging than printed ones due to erratic writing style of the individuals. Problem becomes more severe when the input image is doctor's prescription. Before feeding such image to the OCR engine, the classification of printed and handwritten texts is a necessity as doctor's prescription contains both handwritten and printed texts which are to be processed separately.


The data uploaded here shall support the paper 

Decision Tree Analysis of  ...

which has been submitted to IEEE Transactions on Medical Imaging (2020, September 25) by the authors

Julian Mattes, Wolfgang Fenz, Stefan Thumfart, Gerhard Haitchi, Pierre Schmit, Franz A. Fellner

During review the data shall only be visible for the reviewers of this paper. Afterwards this abstract will be modified and complemented and a dataset image will be uploaded.


Microscopic image based analysis plays an important role in histopathological computer based diagnostics. Identification of childhood medulloblastoma and its proper subtype from biopsy tissue specimen of childhood tumor is an integral part for prognosis.The dataset is of Childhood medulloblastoma (CMB) biopsy samples. The images are of 10x and 100x microscopic magnifications, uploaded in separate folders. The images consist of normal brain tissue cell samples and CMB cell samples of different WHO defined subtypes. An excel sheet is also uploaded for ease of data description.


The dataset contains two folder of diffrent magnification images, i.e; 10x and 100x. The type of each image is described in the provided excel file. Each slide has a unique number and the number in bracket denotes that the corresponding image is of the single slide. 


We present a novel, low-cost telerehabilitation system dedicated for bimanual training. The system captures the user’s movements with a Microsoft Kinect sensor and an inertial measurement unit (IMU). Herein, we deposit data we collected on a single, healthy subject who interacted with our system as described in our manuscript.




The dataset is used in "EXPACT: Explainable complex machine learning prediction of all-cause mortality in the U.S." 


The first bit of light is the gesture of being, on a massive screen of the black panorama. A small point of existence, a gesture of being. The universal appeal of gesture is far beyond the barriers of languages and planets. These are the microtransactions of symbols and patterns which have traces of the common ancestors of many civilizations.Gesture recognition is important to make communication between the computer system and humans, in the present era many studies are going on regarding the gesture recognition systems.