Machine Learning
The major language used on social media platforms is primarily dialectal, posing unique challenges for Natural Language Processing. To address this, a large, manually annotated corpus of approximately 30,500 Saudi dialect tweets in the food delivery app domain was introduced. The corpus was annotated with positive, negative, and neutral sentiment categories. Additionally, the existing SauDiSenti lexicon was expanded by 30%, providing an improved resource for sentiment analysis in the Saudi dialect. the corpus and expanded lexicon have been evaluated using machine learning classifiers.
- Categories:
The LuFI-RiverSnap dataset includes close-range river scene images obtained from various devices, such as UAVs, surveillance cameras, smartphones, and handheld cameras, with sizes up to 4624 × 3468 pixels. Several social media images, which are typically volunteered geographic information (VGI), have also been incorporated into the dataset to create more diverse river landscapes from various locations and sources.
Please see the following links:
- Categories:
This dataset comprises comprehensive information on chemical compounds sourced from the PubChem database, including detailed descriptions for each compound. Each entry in the dataset includes unique PubChem Compound Identifiers (CIDs), molecular structures, physicochemical properties, biological activities, and associated descriptive metadata. The dataset is designed to support research in drug discovery, chemical informatics, and other fields requiring extensive chemical compound information.
- Categories:
Due to the difficulty in obtaining real samples and ground truth, the generalization performance and the fine-tuned performance are critical for the feasibility of stereo matching methods in real-world applications. However, the diverse datasets exhibit substantial discrepancies in disparity distribution and density, thus presenting a formidable challenge to the generalization and fine-tuning of the model.
- Categories:
The data generated during this research includes several distinct components aimed at enhancing the understanding and application of call graph visualization and design pattern detection. This data is housed in a publicly accessible repository and comprises the following elements:
- Categories:
Brain-Computer Interface (BCI) is a technology that enables direct communication between the brain and external devices, typically by interpreting neural signals. BCI-based solutions for neurodegenerative disorders need datasets with patients’ native languages. However, research in BCI lacks insufficient language-specific datasets, as seen in Odia, spoken by 35-40 million individuals in India. To address this gap, we developed an Electroencephalograph (EEG) based BCI dataset featuring EEG signal samples of commonly spoken Odia words.
- Categories:
This dataset contains simulation values from thermo-mechanical finite element analysis simulations using ABAQUS. Each simulation is one of 192 unique process parameter settings which includes varying laser power, scan speed, layer height and cooling assumptions. The geometry for each simulation is a hollow rectangluar box with rounded corners such that they form semi-circles. The wall thickness of each simulation is exactly the width of the focused laser.
- Categories:
The dataset consists of around 335K real images equally distributed among 7 classes. The classes represent different levels of rain intensity, namely "Clear", "Slanting Heavy Rain", "Vertical Heavy Rain", "Slanting Medium Rain", "Vertical Medium Rain", "Slanting Low Rain", and "Vertical Low Rain". The dataset has been acquired during laboratory experiments and simulates a low-altitude flight. The system consists of a visual odometry system comprising a processing unit and a depth camera, namely an Intel Real Sense D435i.
- Categories: