Artificial Intelligence

Each data instance consists of a paragraph (context), a question, and 4 candidate answers. The goal of each system is to determine the most plausible answer to the question by reading the paragraph.

## Expected Output to Leaderboard
If you intend to submit to the leaderboard [here](, please follow the data format described on the that page. The prediction file should contain one label per line.



This is an indoor environment data set collected from our research team's laboratory, and the data is collected from the Intel RealSense D435i camera. There are a total of 12 datasets, each in the format of a `.bag` file in ROS packet format. Each file contains RGB images and IMU data.


The project research team successfully established China's first Inertial Motion Tracking Dataset (IMTD), which can be widely used for artificial intelligence model training in fields such as satellite-free navigation, unmanned driving, and wearable devices. Based on the IMTD dataset, the motion tracking method proposed by Wang Yifeng, Zhao Yi, and others breaks through the limitations of traditional motion tracking and positioning technologies such as inertia, optics, GPS, and carrier phase.


Sign language correctness discrimination (SLCD) dataset is collected for sign language teaching. Different from general sign language recognition datasets, SLCD dataset has two kind labels of sign language category and standardization category at the same time. The standardization category is to describe action correctness of the same sign language made by students. The SLCD dataset videos in this paper are obtained by camera. 76 students are recruited to collect sign language actions.


This paper investigated how to increase the number of connections among users in hierarchical non-terrestrial networks (HNTNs) assisted disaster relief service (DRS). We aim to maximize the number of satisfactory connections (NSCs) by optimizing the unmanned aerial vehicles (UAV) radio resources, computing resources, and trajectory at each time slot. In particular, the UAVs are exploited as aerial base stations (ABSs) to provide a link for the reduced capability (RedCap) user equipment (UE) based on power domain non-orthogonal multiple access (PD-NOMA).


This data collection focuses on capturing user-generated content from the popular social network Reddit during the year 2023. This dataset comprises 29 user-friendly CSV files collected from Reddit, containing textual data associated with various emotions and related concepts.


This is the full ChatGPT transcript for the IEEE Power Engineering Letter "On the Potential of ChatGPT to Generate Distribution Systems for Load Flow Studies using OpenDSS". The abstract for the letter is as follows:


ABSTRACT As the world increasingly becomes more interconnected, the demand for safety and security is ever-increasing, particularly for industrial networks. This has prompted numerous researchers to investigate different methodologies and techniques suitable for intrusion detection systems (IDS) requirements. Over the years, many studies have proposed various solutions in this regard including signature-based and machine-learning (ML) based systems. More recently, researchers are considering deep learning (DL) based anomaly detection approaches. Most proposed works in this research field aimed to achieve either one or a combination of high accuracy, considerably low false alarm rates (FARs), high classification specificity and detection sensitivity, achieving lightweight DL models, or other ML and DL-related performance measurement metrics. In this study, we propose a novel method to convert a raw dataset to an image dataset to magnify patterns. Based on this we devise an anomaly detection for IDS using a lightweight convolutional neural network (CNN) that classifies denial of service and distributed denial of service. The proposed methods were evaluated using a modern dataset, CSE-CIC-IDS2018, and a legacy dataset, NSL-KDD. We have also applied a combined dataset to assess the generalization of the proposed model across various datasets. Our experimental results have demonstrated that the proposed methods achieved high accuracy and considerably low FARs with high specificity and sensitivity. The resulting loss and accuracy curves have also demonstrated the excellent generalization of the proposed lightweight CNN model, effectively avoiding overfitting. This holds for both the modern and legacy datasets, including their mixed version.


FormAI is a novel AI-generated dataset comprising 112,000 compilable and independent C programs. All the programs in the dataset were generated by GPT-3.5-turbo using dynamic zero-shot prompting technique and comprises programs with varying levels of complexity. Some programs handle complicated tasks such as network management, table games, or encryption, while others deal with simpler tasks like string manipulation. Each program is labelled based on vulnerabilities present in the code using a formal verification method based on the Efficient SMT-based Bounded Model Checker (ESBMC).


Problems related to ventral hernia are very common, and evaluating them using computational methods can assist in selecting the most appropriate treatment. This study collected data from over 3500 patients from different European countries observed during last 11 years (2012-2022), which were collected by specialists in hernia surgery. The majority of patients underwent standard surgical procedures, with a growing trend towards robotic surgery. This paper focuses on statistically evaluating the treatment methods in relation to  patient age, body mass index (BMI), and the type of repair.