Artificial Intelligence

Data associated with the article: "PM2.5 Retrieval with Sentinel-5P Data over Europe Exploiting Deep Learning"
- Categories:

Semiconductor manufacturing is a highly complex process requiring precise control and monitoring to maintain product quality and yield. This research presents a comprehensive comparative analysis of three machine learning algorithms—Random Forest, Support Vector Machine (SVM), and XGBoost—for anomaly detection in semiconductor fabrication. Through extensive experimentation using a real-world wafer dataset, we demonstrate that XGBoost outperforms other models, achieving 97.1\% accuracy, 96.4\% precision, and 95.0\% recall.
- Categories:

This dataset, constructed around the Jilin Baishan Incident, aims to enhance the emotion prediction capabilities of large language models. Approximately 3.5 million raw comments were collected via the Weibo API, covering key information such as user identifiers, text content, timestamps, and interaction metrics. The data underwent preprocessing steps including normalization, Chinese tokenization, stopword removal, deduplication, and anomalous sample exclusion.
- Categories:
The UQTR dataset consists of 7838 real and synthetic images of the Université du Québec à Trois-Rivières (UQTR) campus road under normal and snow conditions. The image resolution is 1280×720. It includes lane labels in .txt files, where each row stores the set of points of a lane. The points are stored as x1 y1 x2 y2, as in the tutorial by Ruijin Liu, Zejian Yuan, Tie Liu, Zhiliang Xiong: Train and Test Your Custom Data.
- Categories:

The data is derived from 22,898 comments on driverless and human driving obtained by crawler technology on China's Weibo and XiaoHongshu platforms from May 1 to August 31, 2024. The main data formats are xlsx, py, txt, json and so on. The files in py format are script files, which are used to process data. The dataset was eventually used for topic mining, sentiment analysis, and more on Chinese users' comments on driverless and human driving.
- Categories:

Walnut and Heart CT Data corresponding to Noisier2Inverse consist of high-resolution computed tomography (CT) scans used for evaluating deep learning-based image reconstruction under severe noise conditions. The dataset includes walnut CT scans from controlled experimental settings and clinical cardiac CT images. The Walnut data stems from this source: https://paperswithcode.com/dataset/cbct-walnut, and the Heart CT data is processed in python before, and is provided in .pt format.
- Categories:

miRNAs influence cellular functions by regulating gene expression and interacting with diverse biomolecules within the cell. Accurate prediction of miRNAdisease associations (MDA) plays a crucial role in disease diagnosis, treatment, and drug development. However, existing computational methods focus on network structure and ignore multi-view information such as linear and non-linear when extracting miRNA and disease features. In addition, these models are generally “blackbox” in nature, which limits the understanding of their prediction mechanisms.
- Categories:

During the course of this experimental study, we meticulously collected and recorded a comprehensive set of data. These data not only reflect the precise outcomes of the experimental procedures but also directly correspond to the contents presented in the tables within the research paper. These results are crucial for validating our research hypotheses, providing a solid quantitative foundation for our understanding and analysis of the experimental phenomena.
- Categories:

Shape completion remains a fundamental challenge in computer vision and image processing, particularly for tasks involving hand-drawn sketches and occluded objects. Traditional deep learning methods such as Generative Adversarial Networks (GANs) and Convolutional Neural Networks (CNNs) often suffer from high computational costs and poor generalization on sparse, abstract structures.
- Categories:

This is a data for cosmetics dataset. The International Patent Classification (IPC) is a standardized, hierarchical system used worldwide to categorize the technical content of patents. It is administered by the World Intellectual Property Organization (WIPO). The IPC system breaks down technology into sections, classes, subclasses, and groups, each representing specific technical domains. By assigning IPC codes to patent documents, patent offices and researchers can systematically organize, search, and analyze patent information across various industries and technological fields.
- Categories: