Machine Learning

This study presents a English-Luganda parallel corpus comprising over 2,000 sentence pairs, focused on financial decision-making and products. The dataset draws from diverse sources, including social media platforms (TikTok comments and Twitter posts from authoritative accounts like Bank of Uganda and Capital Markets Uganda), as well as fintech blogs (Chipper Cash and Xeno). The corpus covers a range of financial topics, including bonds, loans, and unit trust funds, providing a comprehensive resource for financial language processing in both English and Luganda.

Categories:
176 Views

Two-year price movements from 01/01/2014 to 01/01/2016 of 88 stocks are selected to target, coming from all the 8 stocks in the Conglomerates sector and the top 10 stocks in capital size in each of the other 8 sectors. The full list of 88 stocks and their companies selected from 9 sectors is available in StockTable, a facsimile of the paper appendix appendix_table_of_target_stocks.pdf.

Categories:
35 Views

The softwarization and virtualization of the fifth-generation (5G) cellular networks bring about increased flexibility and faster deployment of new services. However, these advancements also introduce new vulnerabilities and unprecedented attack surfaces. The cloud-native nature of 5G networks mandates detecting and protecting against threats and intrusions in the cloud systems.

Categories:
196 Views

It is a dataset containing sentence segments from cutomer reviews about mobile phone from different sources like Amazon, Flipkart, Tweeter and some existing datasets. It contains more than 1000 records tagged with one of the five aspect categories battery, camera, display, price and processor. Whether a sentence segment has sentiment expression (subjective/ objective) is also tagged and the sentiment orientation (positive/ negative/ neutral) of each sentence segment is assigned. Explicit or implicit presence of aspect is also maintained.

Categories:
127 Views

This work presents a specialized dataset designed to advance autonomous navigation in hiking trail and off-road natural environments. The dataset comprises over 1,250 images (640x360 pixels) captured using a camera mounted on a tele-operated robot on hiking trails. Images are manually labeled into eight terrain classes: grass, rock, trail, root, structure, tree trunk, vegetation, and rough trail. The dataset is provided in its original form without augmentations or resizing, allowing end-users flexibility in preprocessing.

Categories:
507 Views

The dataset provides detailed information for wheat crop monitoring in the Karnal District, India, spanning the period from 2010 to 2022. It is divided into four main components. The first component, Remote Sensing Data, includes Sentinel-2 (10 m resolution) satellite data averaged over village boundaries, specifically over a wheat crop mask. This folder contains two Excel files: one for NDVI (Normalized Difference Vegetation Index) and another for NDWI (Normalized Difference Water Index), both providing fortnightly data during the Rabi season across a 10-year period.

Categories:
413 Views

This dataset, titled "Synthetic Sand Boil Dataset for Levee Monitoring: Generated Using DreamBooth Diffusion Models," provides a comprehensive collection of synthetic images designed to facilitate the study and development of semantic segmentation models for sand boil detection in levee systems. Sand boils, a critical factor in levee integrity, pose significant risks during floods, necessitating accurate and efficient monitoring solutions.

Categories:
287 Views

We organized and collected two years' worth of complete fault work orders from a wind farm, and structured these work orders into a fault diagnosis event knowledge graph using the proposed algorithm. This graph includes fault modes, fault impacts, fault symptoms, inspection schemes, root cause identification, and maintenance strategies, covering all potential fault information and handling methods for wind turbines. This dataset records the head entity-relation-tail entity information in the form of triples using JSON format.

Categories:
748 Views

Surface electromyography (EMG) can be used to interact with and control robotic systems via intent recognition. However, most machine learning algorithms used to decode EMG signals have been trained on relatively small datasets with limited subjects, which can affect their widespread generalization across different users and activities. Motivated by these limitations, we developed EMGNet - a large-scale dataset to support research and development in EMG neural decoding, with an emphasis on human locomotion.

Categories:
886 Views

Pages