Skip to main content

Dataset Search

Displaying 505 - 528 of 8289 results

The experimental protocol provided to record for two minutes two IMUs channels assembled on a variable stiffness prosthetic foot (one on the main body of the prosthesis and the other on the ankle), 3-axis linear accelerations (ax , ay and az) and 3-axis Euler angles (θx , θy and θz), at 100 Hz for seven activities (walking, fast walking, stand, stairs and ramps ascending and descending) under different Dx slider configurations (Dx = 42, 45, 48, 51, and 54 mm), i.e. stiffness settings.

Categories:

This is a map-matching dataset, likely used for geospatial analysis and route mapping applications. The dataset is organized into multiple numbered segments, each potentially containing different routes, map regions, or tracking data.

Based on the name "map-matching-dataset" and the file structure, this dataset is probably used for developing or testing algorithms that match trajectory traces to road networks. Such datasets typically contain trajectory points, road network data, and ground truth matches for evaluation purposes.

 

Categories:

We present a dataset of histopathology images from OSCC patients treated at Sun Yat-sen Memorial Hospital (2015–2022). Each case includes two tissue sections (core and boundary), with six images per patient captured at ×200, ×400, and ×1000 magnifications (2592×1944 pixels). Key histopathological features—such as cancer cells, nests, keratin pearls, nuclear atypia, and necrosis—are included. The study was approved by the Ethics Committee with a waiver of informed consent, and patient-level diagnosis and prognosis annotations were obtained from electronic records.

 

Categories:

Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a smaller language model (student) from a larger one (teacher). Previous research has introduced several distillation methods for both generating training data and training the student model.

Categories:

Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a smaller language model (student) from a larger one (teacher). Previous research has introduced several distillation methods for both generating training data and training the student model.

Categories:

Data associated with the article: "PM2.5 Retrieval with Sentinel-5P Data over Europe Exploiting Deep Learning"

This dataset provides pre-processed Sentinel-5P imagery reprojected onto the CAMS grid for PM2.5 estimation. Each sample contains the first 60 principal components extracted from the Sentinel-5P spectral bands, excluding the UV range, after applying Principal Component Analysis (PCA). The final band in each sample represents the PM2.5 concentration values obtained from the CAMS dataset.

 

Categories:

Semiconductor manufacturing is a highly complex process requiring precise control and monitoring to maintain product quality and yield. This research presents a comprehensive comparative analysis of three machine learning algorithms—Random Forest, Support Vector Machine (SVM), and XGBoost—for anomaly detection in semiconductor fabrication. Through extensive experimentation using a real-world wafer dataset, we demonstrate that XGBoost outperforms other models, achieving 97.1\% accuracy, 96.4\% precision, and 95.0\% recall.

Categories:

The Influence of the Pusdiklatkar Website Implementation on the Performance of Firefighters in DKI Jakarta This study aims to analyze the impact of implementing the Pusat Pendidikan dan Pelatihan Kebakaran (Pusdiklatkar) website on the performance of firefighters in DKI Jakarta. In the digital era, the utilization of information technology in training and personnel development has become increasingly crucial to enhancing work effectiveness and efficiency.

Categories:

This dataset contains anonymized Twitter data related to tourist activities in Bangkok, Thailand. It was collected to analyze travel behavior, activity preferences, and temporal patterns during events like the Songkran festival. The dataset includes timestamped activity classifications, geographic information at a generalized level, and extracted named entities relevant to tourism. The dataset is from 2019-04-05 to 2019-04-24.

Categories:

This paper introduces an aircraft engine remaining life prediction model based on an improved Transformer architecture (SBi-Transformer), addressing the computational inefficiencies and inadequate local temporal dependency capturing capabilities of the standard transformer model in processing long sequence data. The SBi-Transformer employs a dual-layer attention mechanism to reduce computational complexity and enhance local temporal dependencies.

Categories:

Attention Deficit Hyperactivity Disorder (ADHD) is a prevalent neurodevelopmental disorder affecting children and adolescents, characterized by inattention, hyperactivity, and impulsivity. Current diagnostic methods primarily rely on subjective clinical evaluations, which are prone to bias. Advances in neurophysiological assessment, particularly through electroencephalography (EEG), eye tracking, and electrodermal activity (EDA), offer promising avenues for objective diagnosis and monitoring of ADHD.

Categories:

The study aims to construct a high-quality continuous frame image dataset of laser welding to support the research of automatic welding state detection in intelligent laser welding. The data set is collected by our self-built welding platform and a high-resolution industrial camera. The dataset not only makes up for the shortage of existing laser welding video data, but also provides rich molten pool and welding process context information, which facilitates the development and verification of condition detection algorithms.

Categories:

This dataset, constructed around the Jilin Baishan Incident, aims to enhance the emotion prediction capabilities of large language models. Approximately 3.5 million raw comments were collected via the Weibo API, covering key information such as user identifiers, text content, timestamps, and interaction metrics. The data underwent preprocessing steps including normalization, Chinese tokenization, stopword removal, deduplication, and anomalous sample exclusion.

Categories:

 

The UQTR dataset consists of 7838 real and synthetic images of the Université du Québec à Trois-Rivières (UQTR) campus road under normal and snow conditions. The image resolution is 1280×720. It includes lane labels in .txt files, where each row stores the set of points of a lane. The points are stored as x1 y1 x2 y2, as in the tutorial by Ruijin Liu, Zejian Yuan, Tie Liu, Zhiliang Xiong: Train and Test Your Custom Data.

Categories:

The data is derived from 22,898 comments on driverless and human driving obtained by crawler technology on China's Weibo and XiaoHongshu platforms from May 1 to August 31, 2024. The main data formats are xlsx, py, txt, json and so on. The files in py format are script files, which are used to process data. The dataset was eventually used for topic mining, sentiment analysis, and more on Chinese users' comments on driverless and human driving.

Categories:

This dataset includes a database of 15 feature parameters such as image texture features, pixel statistical features, and geometric features extracted from scans of fixed and living cell samples of two types of cells, HeLa and SiHa, using an adaptive harmonic atomic force microscopy probe and a commercial atomic force microscope. The total number of data contained in the database is 2400.

Categories:

We present a comprehensive dataset developed as part of a study to compute real-time kinematics using a full-body wearable approach incorporating up to 12 IMUs. This dataset includes optical and inertial measurements from 22 subjects engaged in a diverse set of 9 activities: walking, running, squatting, boxing, yoga, dance, badminton, stair climbing, and seated extremity exercises. The dataset features ground truth kinematics, offline predicted kinematics, online predicted kinematics, and IMU-simulated offline predicted kinematics.

 

Categories:

Walnut and Heart CT Data corresponding to Noisier2Inverse consist of high-resolution computed tomography (CT) scans used for evaluating deep learning-based image reconstruction under severe noise conditions. The dataset includes walnut CT scans from controlled experimental settings and clinical cardiac CT images. The Walnut data stems from this source: https://paperswithcode.com/dataset/cbct-walnut, and the Heart CT data is processed in python before, and is provided in .pt format.

Categories:

miRNAs influence cellular functions by regulating gene expression and interacting with diverse biomolecules within the cell. Accurate prediction of miRNAdisease associations (MDA) plays a crucial role in disease diagnosis, treatment, and drug development. However, existing computational methods focus on network structure and ignore multi-view information such as linear and non-linear when extracting miRNA and disease features. In addition, these models are generally “blackbox” in nature, which limits the understanding of their prediction mechanisms.

Categories:

During the course of this experimental study, we meticulously collected and recorded a comprehensive set of data. These data not only reflect the precise outcomes of the experimental procedures but also directly correspond to the contents presented in the tables within the research paper. These results are crucial for validating our research hypotheses, providing a solid quantitative foundation for our understanding and analysis of the experimental phenomena.

Categories:

Shape completion remains a fundamental challenge in computer vision and image processing, particularly for tasks involving hand-drawn sketches and occluded objects. Traditional deep learning methods such as Generative Adversarial Networks (GANs) and Convolutional Neural Networks (CNNs) often suffer from high computational costs and poor generalization on sparse, abstract structures.

Categories:

These are the daily closing prices of four stock indices including Shanghai Securities Composite Index (SSEC) and the Shenzhen Securities Component Index (SZI) from China, the Straits Times Index (STI) from Singapore, and the Standard & Poor 500 Index (SPX) from the United States. The SSEC data is from December 19,1990 to May 25, 2023. The data of SZI is from April 3, 1991 to May 25, 2023. The STI data is from December 3, 1990, to May 25, 2023, and the data of SPX is from December 3, 1990 to May 25, 2023.

Categories:

These are the daily closing prices of four stock indices including Shanghai Securities Composite Index (SSEC) and the Shenzhen Securities Component Index (SZI) from China, the Straits Times Index (STI) from Singapore, and the Standard & Poor 500 Index (SPX) from the United States. The SSEC data is from December 19,1990 to May 25, 2023. The data of SZI is from April 3, 1991 to May 25, 2023. The STI data is from December 3, 1990, to May 25, 2023, and the data of SPX is from December 3, 1990 to May 25, 2023.

Categories: