Wildfires are one of the deadliest and dangerous natural disasters in the world. Wildfires burn millions of forests and they put many lives of humans and animals in danger. Predicting fire behavior can help firefighters to have better fire management and scheduling for future incidents and also it reduces the life risks for the firefighters. Recent advance in aerial images shows that they can be beneficial in wildfire studies.
The aerial pile burn detection dataset consists of different repositories. The first one is a raw video recorded using the Zenmuse X4S camera. The format of this file is MP4. The duration of the video is 966 seconds with a Frame Per Second (FPS) of 29. The size of this repository is 1.2 GB. The first video was used for the "Fire-vs-NoFire" image classification problem (training/validation dataset). The second one is a raw video recorded using the Zenmuse X4S camera. The duration of the video is 966 seconds with a Frame Per Second (FPS) of 29. The size of this repository is 503 MB. This video shows the behavior of one pile from the start of burning. The resolution of these two videos is 1280x720.
The third video is 89 seconds of heatmap footage of WhiteHot from the thermal camera. The size of this repository is 45 MB. The fourth one is 305 seconds of GreentHot heatmap with a size of 153 MB. The fifth repository is 25 mins of fusion heatmap with a size of 2.83 GB. All these three thermal videos are recorded by the FLIR Vue Pro R thermal camera with an FPS of 30 and a resolution of 640x512. The format of all these videos is MOV.
The sixth video is 17 mins long from the DJI Phantom 3 camera. This footage is used for the purpose of the "Fire-vs-NoFire" image classification problem (test dataset). The FPS is 30, the size is 32 GB, the resolution is 3840x2160, and the format is MOV.
The seventh repository is 39,375 frames that resized to 254x254 for the "Fire-vs-NoFire" image classification problem (Training/Validation dataset). The size of this repository is 1.3 GB and the format is JPEG.
The eighth repository is 8,617 frames that resized to 254x254 for the "Fire-vs-NoFire" image classification problem (Test dataset). The size of this repository is 301 MB and the format is JPEG.
The ninth repository is 2,003 fire frames with a resolution of 3480x2160 for the fire segmentation problem (Train/Val/Test dataset). The size of this repository is 5.3 GB and the format is JPEG.
The last repository is 2,003 ground truth mask frames regarding the fire segmentation problem. The resolution of each mask is 3480x2160. The size of this repository is 23.4 MB.
The preprint article of this dataset is available here:
For more information please find the Table at:
To find other projects and articles in our group:
The LGP dataset (LGPSSD) consists of LGP samples collected from the industrial site through the image acquisition device of LGP defect detection system. In our dataset, NG samples are regarded as positive samples, and OK samples are regarded as negative samples.
The LGP dataset (LGPSSD) consists of LGP samples collected from the industrial site through the image acquisition device of LGP defect detection system. In our dataset, NG samples are regarded as positive samples, and OK samples are regarded as negative samples. Each sample is a grayscale image with a size of 224 * 224 , and has two types of labels: One is the Mask label which is used to supervise the training process of the segmentation subnet, and the other is the classification label (NG corresponds to 1, and OK corresponds to 0), which is employed to supervise the training process of the decision subnet. The dataset totally contains 422 positive samples and 400 negative samples.
Characteristics: The difference in density between the light guide point distribution of LGP images, the different size, shape and brightness of LGP defects.
This dataset has been created from a collection of 56403 multidisciplinary book titles from Springer, available through the Hellenic Academic Libraries Link (https://www.heal-link.gr/en/home-2/) subscription. To obtain this dataset, a parser was created for extracting relevant information, such as the title, subtitle and ToC, from each book. The extracted information was stored in a database for further processing. Each book title in the database includes information regarding the bookid, title, and ToC.
This dataset is a set of .picle files and can be loaded in any python script or jupiter notebook as a dataframe using the following command
new_data_26_cat = pickle.load(open("springer_dataframe_26_categories.p", "rb") )
new_data_5_cat = pickle.load(open("springer_dataframe_5_categories.p", "rb") )
This is a large Chinese taxonomic knowledge base, which is translated from Probase by the neural network.
It has 11,292,493 IsA pairs with an accuracy of 86.6%.
Amidst the COVID-19 pandemic, cyberbullying has become an even more serious threat. Our work aims to investigate the viability of an automatic multiclass cyberbullying detection model that is able to classify whether a cyberbully is targeting a victim’s age, ethnicity, gender, religion, or other quality. Previous literature has not yet explored making fine-grained cyberbullying classifications of such magnitude, and existing cyberbullying datasets suffer from quite severe class imbalances.
Please cite the following paper when using this open access dataset: J. Wang, K. Fu, C.T. Lu, “SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection,” Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), December 10-13, 2020.
This is a "Dynamic Query Expansion"-balanced dataset containing .txt files with 8000 tweets for each of a fine-grained class of cyberbullying: age, ethnicity, gender, religion, other, and not cyberbullying.
Total Size: 6.33 MB
Includes some data from:
S. Agrawal and A. Awekar, “Deep learning for detecting cyberbullying across multiple social media platforms,” in European Conference on Information Retrieval. Springer, 2018, pp. 141–153.
U. Bretschneider, T. Wohner, and R. Peters, “Detecting online harassment in social networks,” in ICIS, 2014.
D. Chatzakou, I. Leontiadis, J. Blackburn, E. D. Cristofaro, G. Stringhini, A. Vakali, and N. Kourtellis, “Detecting cyberbullying and cyberaggression in social media,” ACM Transactions on the Web (TWEB), vol. 13, no. 3, pp. 1–51, 2019.
T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” arXiv preprint arXiv:1703.04009, 2017.
Z. Waseem and D. Hovy, “Hateful symbols or hateful people? predictive features for hate speech detection on twitter,” in Proceedings of the NAACL student research workshop, 2016, pp. 88–93.
Z. Waseem, “Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter,” in Proceedings of the first workshop on NLP and computational social science, 2016, pp. 138–142.
J.-M. Xu, K.-S. Jun, X. Zhu, and A. Bellmore, “Learning from bullying traces in social media,” in Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2012, pp. 656–666.
This is a large Chinese commonsense knowledge base, which is translated from ConceptNet 5.6, with around 2 million triples and an accuracy of 89.6%.
Data for the study has been retrieved from a publicly available data set of a leading European P2P lending platform, Bondora (https://www.bondora.com/en). The retrieved data is a pool of both defaulted and non-defaulted loans from the time period between 1st March 2009 and 27th January 2020. The data comprises demographic and financial information of borrowers and loan transactions. In P2P lending, loans are typically uncollateralized and lenders seek higher returns as compensation for the financial risk they take.
The dataset also consists of data preprocessing Jupyter notebook that will help in working with the data and to perform basic data pre-processing. The zip file of the dataset consists of pre-processed and raw dataset directly extracted from the Bondora website https://www.bondora.com/en.
In the attached notebook, I have used my intuition and assumption for performing data-preprocessing.
GesHome dataset consists of 18 hand gestures from 20 non-professional subjects with various ages and occupation. The participant performed 50 times for each gesture in 5 days. Thus, GesHome consists of 18000 gesture samples in total. Using embedded accelerometer and gyroscope, we take 3-axial linear acceleration and 3-axial angular velocity with frequency equals to 25Hz. The experiments have been video-recorded to label the data manually using ELan tool.
That's a dataset.