Artificial Intelligence

Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a smaller language model (student) from a larger one (teacher). Previous research has introduced several distillation methods for both generating training data and training the student model.

Categories:
20 Views

Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a smaller language model (student) from a larger one (teacher). Previous research has introduced several distillation methods for both generating training data and training the student model.

Categories:
20 Views

Data associated with the article: "PM2.5 Retrieval with Sentinel-5P Data over Europe Exploiting Deep Learning"

Categories:
23 Views

Semiconductor manufacturing is a highly complex process requiring precise control and monitoring to maintain product quality and yield. This research presents a comprehensive comparative analysis of three machine learning algorithms—Random Forest, Support Vector Machine (SVM), and XGBoost—for anomaly detection in semiconductor fabrication. Through extensive experimentation using a real-world wafer dataset, we demonstrate that XGBoost outperforms other models, achieving 97.1\% accuracy, 96.4\% precision, and 95.0\% recall.

Categories:
36 Views

This dataset, constructed around the Jilin Baishan Incident, aims to enhance the emotion prediction capabilities of large language models. Approximately 3.5 million raw comments were collected via the Weibo API, covering key information such as user identifiers, text content, timestamps, and interaction metrics. The data underwent preprocessing steps including normalization, Chinese tokenization, stopword removal, deduplication, and anomalous sample exclusion.

Categories:
Views

 

The UQTR dataset consists of 7838 real and synthetic images of the Université du Québec à Trois-Rivières (UQTR) campus road under normal and snow conditions. The image resolution is 1280×720. It includes lane labels in .txt files, where each row stores the set of points of a lane. The points are stored as x1 y1 x2 y2, as in the tutorial by Ruijin Liu, Zejian Yuan, Tie Liu, Zhiliang Xiong: Train and Test Your Custom Data.

Categories:
217 Views

The data is derived from 22,898 comments on driverless and human driving obtained by crawler technology on China's Weibo and XiaoHongshu platforms from May 1 to August 31, 2024. The main data formats are xlsx, py, txt, json and so on. The files in py format are script files, which are used to process data. The dataset was eventually used for topic mining, sentiment analysis, and more on Chinese users' comments on driverless and human driving.

Categories:
33 Views

Walnut and Heart CT Data corresponding to Noisier2Inverse consist of high-resolution computed tomography (CT) scans used for evaluating deep learning-based image reconstruction under severe noise conditions. The dataset includes walnut CT scans from controlled experimental settings and clinical cardiac CT images. The Walnut data stems from this source: https://paperswithcode.com/dataset/cbct-walnut, and the Heart CT data is processed in python before, and is provided in .pt format.

Categories:
25 Views

miRNAs influence cellular functions by regulating gene expression and interacting with diverse biomolecules within the cell. Accurate prediction of miRNAdisease associations (MDA) plays a crucial role in disease diagnosis, treatment, and drug development. However, existing computational methods focus on network structure and ignore multi-view information such as linear and non-linear when extracting miRNA and disease features. In addition, these models are generally “blackbox” in nature, which limits the understanding of their prediction mechanisms.

Categories:
29 Views

During the course of this experimental study, we meticulously collected and recorded a comprehensive set of data. These data not only reflect the precise outcomes of the experimental procedures but also directly correspond to the contents presented in the tables within the research paper. These results are crucial for validating our research hypotheses, providing a solid quantitative foundation for our understanding and analysis of the experimental phenomena.

Categories:
23 Views

Pages