Machine Learning | IEEE DataPort

AI Training Data for OCT-SLO Self Calibration and Automation

Attached Image data set from combined OCT-SLO is used to train AI models and identify features to maximize quality of data set to adjust MZI reference arm, PMT Voltage of Liquid Lens and location of object. Why adjustment is needed is explained below:

Categories:

Solar power datset

This dataset consists of meteorological and environmental data collected in Riyadh, Saudi Arabia, over multiple years. The variables include solar radiation, temperature (both maximum and minimum in Celsius and Fahrenheit), precipitation, vapor pressure, and snow water equivalent, among others. The data spans from 2010 to the present, providing insights into solar radiation patterns, daily temperature fluctuations, and weather-related factors that can impact solar power generation. Specifically, the dataset contains the following columns:

Categories:

ONE-MS-I: A micro-services based network traffic dataset

The shift towards cloud-native applications has been accelerating in recent years. Modern applications are increasingly distributed, taking advantage of cloud-native features such as scalability, flexibility, and high availability. However, this evolution also introduces various security challenges. From a networking perspective, the large number of interconnected components and their intricate communication patterns make detecting and mitigating traffic anomalies a complex task.

Categories:

SNMDat2.0

SNMDat2.0 is a comprehensive multimodal dataset, expanded from the unimodal TwiBot-20, designed for Twitter social bot detection. Specifically, we add 274587 profile images and profile background images, 86498 tweet images and 49549 tweet videos based on the original 229580 twitter users, 227979 follow relationships and 33488192 tweet text.

Categories:

Bibliometric Scopus data for Leaning Analytics

This dataset provides bibliometric information of academic publications related to learning analytics and decision sciences, sourced from Scopus. It includes metadata for a wide range of papers, including author details, titles, publication years, journal sources, and document types. Key columns in the dataset include author names, IDs, titles of publications, source titles (journals or conferences), document types, publication stage, and open access status.

Categories:

GeoLife Dataset

The rapid growth of spatiotemporal data makes trajectory modeling critical for extracting patterns from large-scale, dynamic mobility datasets. However, many existing methods face challenges with scalability and computational inefficiency. To address these challenges, we propose VecLSTM—a vectorized Long Short-Term Memory (LSTM) framework designed to improve both predictive accuracy and processing performance. VecLSTM introduces a novel dynamic vectorization layer that converts raw GPS trajectories into structured vector embeddings, enabling efficient storage, retrieval, and preprocessing.

Categories:

Machine Learning

Combined rumor and non-rumor dataset

Categories:

Machine Learning

Forbes Billionaire dataset

The Forbes 2022 Billionaires List dataset contains information about the world's wealthiest individuals, including their net worth, industry, country, and key business ventures. The dataset provides structured details such as rankings, company associations, and financial status, making it useful for various NLP tasks like table-to-text generation, entity recognition, and financial analysis.

Categories:

Bangla Social Media Cyberbullying Dataset

Cyberbullying is a growing problem on social media. This dataset helps detect cyberbullying in Bangla by collecting comments from YouTube, Facebook, Instagram, and TikTok. The data is categorized into two types: bullying and non-bullying. It includes various abusive and harmful texts, along with normal conversations. This dataset will help researchers and developers train AI models to automatically identify cyberbullying in Bangla text. The goal is to create better tools to keep online spaces safe for Bangla-speaking users.

Categories:

Machine Learning

Course Rating

This dataset comprises a comprehensive collection of educational courses, each characterized by several key attributes: interests, title, description, category, level, past experience, and rating.

Categories: