Artificial Intelligence
Forum-java is a log dataset that we collected in an open source java-based web forum system {https://github.com/Qbian61/forum-java.}. It is a Java-based forum platform developed by a technology company and widely used for social media and programming technique sharing it contains abundant and diverse functions, like posting articles, creating FAQs, etc., which can satisfy most of the requirements of users.
- Categories:
The risks to children of online predators in real time gaming environments have been an area of growing concern. Research towards the development of near real time capabilities has been the focus of most queries published in this area of study. In this paper, we present Protectbot, a comprehensive safety framework used to interact with users in online gaming chat rooms. Protectbot employs a variant of the GPT-2 model known as DialoGPT, a generative pre-trained transformer designed specifically for conversation.
- Categories:
This Named Entities dataset is implemented by employing the widely used Large Language Model (LLM), BERT, on the CORD-19 biomedical literature corpus. By fine-tuning the pre-trained BERT on the CORD-NER dataset, the model gains the ability to comprehend the context and semantics of biomedical named entities. The refined model is then utilized on the CORD-19 to extract more contextually relevant and updated named entities. However, fine-tuning large datasets with LLMs poses a challenge. To counter this, two distinct sampling methodologies are utilized.
- Categories:
<p> The dataset is digital health data. It contains heart rate data extracted from Fitbit version 2 smartwatch worn by a healthy male Asian person of 48 years old. Data is of one-month duration. We have uploaded a zip file that contains data from different days. Data for each day has a separate file. The file name contains the date. Each file is in csv format. Each file has two columns – timestamp and heart rate. It is a continuous time-series heart rate data. Heart rate was recorded seamlessly at 5 sec interval. However, there may be missing datum.
- Categories:
The HQA1K dataset was developed for assessing the quality of Computer Generated Holography (CGH) image renderings based on direct human input.
HQA1K is comprised of 1,000 pairs of natural images matched to simulated CGH renderings of various quality levels. The result is a diverse set of data for evaluating image quality algorithms and models.
- Categories:
This data provides the traffic data transmission and reception at Wikipedia's six data centers (Eqiad, Codfw, Esams, Ulsfo, Eqsin, and Drmrs) in Wikitech.
- Eqiad : Data center located in Ashburn, USA
- Codfw : Data Center in Carrollton, Texas, USA
- Esams : Data center located in Amsterdam, The Netherlands
- Ulsfo : Data Center located in San Francisco
- Eqsin : Data Center located in Singapore
- Drmrs : Data Center located in Marseille, France.
- Categories:
LSD4WSD: Learning SAR Dataset for Wet Snow Detection.
The dataset can be found at : https://zenodo.org/record/8111485
- Categories:
This Dataset used a non-invasive blood group prediction approach using deep learning. Rapid and meticulous prediction of blood type is a major step during medical emergency before supervising the red blood cell, platelet, and plasma transfusion. Any small mistake during transfer of blood can cause death. In conventional pathological assessment, the blood test is conducted using automated blood analyser; however, it results into time taking process.
- Categories:
This report presents an end-to-end methodology for collecting datasets to recognize handwritten English alphabets in the Indian context by utilizing Inertial Measurement Units (IMUs) and leveraging the diversity present in the Indian writing style. The IMUs are utilized to capture the dynamic movement patterns associated with handwriting, enabling more accurate recognition of alphabets. The Indian context introduces various challenges due to the heterogeneity in writing styles across different regions and languages.
- Categories: