*.csv

  • The dataset consists of feature vectors belonging to 12,330 sessions. The dataset was formed so that each session would belong to a different user in a 1-year period to avoid any tendency to a specific campaign, special day, user profile, or period.
  • Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples ending with shopping.
  • The dataset consists of 10 numerical and 8 categorical attributes. The 'Revenue' attribute can be used as the class label.
Categories:
378 Views

TamilCOCO is a novel bilingual image captioning dataset specifically designed for Tamil, a low-resource language. This dataset facilitates research in image captioning, cross-lingual natural language processing, and culturally adapted AI applications.

Dataset Statistics


Categories:
62 Views

Currently, existing public datasets based on peripheral physiological signals are limited, and there is a lack of emotion recognition (ER) datasets specifically customized for smart classroom scenarios. Therefore, we have collected and constructed the I+ Lab Emotion (ILEmo) dataset, which is specifically designed for the emotion monitoring of students in classroom. The raw data of the ILEmo dataset is collected by the I+ Lab at Shandong University, using custom multi-modal wristbands and computing suites.

Categories:
232 Views

The SDUITC database is a multi-modal resourse developed at the Shandong Cooperative Vehicle-Infrastructure Test Base, which uses roadside cameras and LiDAR to monitor road targets and collect point cloud information. Following ground segmentation (target point cloud extraction), target identification and tracking, and feature extraction, the target point cloud information is refined and summarized into the following content: 1. Video snapshot of the captured target; 2. Point cloud clustering information for the target; 3. Feature tables.

Categories:
117 Views

This dataset provides turbidity measurements collected during a Moringa oleifera leaf water treatment process for compound extraction. The extraction process was conducted over a 15-minute duration, capturing key changes in turbidity to reflect the dynamics of the process. The raw data has been preprocessed, upsampled, and annotated for time series analysis, enabling detailed investigation of extraction patterns. Additionally, the dataset has been optimized using the ForGAN (Forecasting GAN) algorithm to enhance data granularity and support predictive modeling.

Categories:
63 Views

In-vehicle networks are responsible for safety-critical control applications, depending on data communication between electronic control units, and most are based on the CAN protocol. A huge amount of data is necessary for reliability, safety, and cybersecurity analysis in today's automotive solutions, especially to feed machine learning models. It is relevant to provide comprehensive datasets about CAN communication and different driving situations, which represents a lack in recent research because most public datasets are very limited.

Categories:
100 Views

This dataset is the rent price for Kuala Lumpur and its neighborhood obtained from mudah.my in July 2024. The raw data is unprocessed and contains the original description of the house, the details in JSON format, the rent price, and the period.

This dataset is ideal for making rent price forecasts and exploring in depth what factors influence rent prices.

 

Categories:
235 Views

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1.

Categories:
54 Views

This dataset integrates financial and macroeconomic indicators to support research on stock price prediction and financial forecasting. It includes daily stock data for Malayan Banking Berhad (MBB) (1155.KL) sourced from Yahoo Finance, alongside macroeconomic indicators such as GDP (constant 2015 MYR), GDP growth (YoY %), inflation rate (%), and the Overnight Policy Rate (OPR). The data spans a 20-year period from July 1, 2004, to August 1, 2024, and has been standardized to a daily frequency.

Categories:
96 Views

This collection includes multiple short text classification datasets designed for various natural language processing tasks. It contains several topic classification datasets, such as AG'News, Snippets, and TMNNews, which cover a wide range of topics and domains to evaluate the effectiveness of classification models. Additionally, the collection includes a binary sentiment classification dataset, such as Twitter, aimed at determining positive or negative sentiment in text.

Categories:
36 Views

Pages