Machine Learning

ChnSentiCorp

This dataset is a large-scale Chinese hotel review data set collected by Tan Songbo. The corpus size is 10,000 reviews. The corpus is automatically collected and organized from Trip.com.

Categories:: Artificial Intelligence
Machine Learning

1192 Views

A Large-Scale Dataset for Active Fire Detection/Segmentation (Landsat-8)

This dataset was created from all Landsat-8 images from South America in the year 2018. More than 31 thousand images were processed (15 TB of data), and approximately on half of them active fire pixels were found. The Landsat-8 sensor has 30 meters of spatial resolution (1 panchromatic band of 15m), 16 bits of radiometric resolution and 16 days of temporal resolution (revisit). The images in our dataset are in TIFF (geotiff) format with 10 bands (excluding the 15m panchromatic band).

Categories:: Artificial Intelligence
Computer Vision
Image Processing
Machine Learning
Remote Sensing
Geoscience and Remote Sensing
Climate Change/Environmental

6380 Views

network service traffic

The dataset includes 2 parts: private and public traffic.

The private traffic is self-captured network traffic of serveral softwares, such as YouTube, Skype, streaming video, totally 16 categories.

The public traffic is an open VPN dataset, including numorous VPN or nonVPN network services, totally 24 categories.

Categories:: Machine Learning
Communications

1159 Views

An image dataset of various skin conditions and rashes

The dataset contains rash images of 11 different disease states. Images of normal skin are also included in the dataset.

Categories:: Machine Learning
Image Processing
Health

18117 Views

Spoken Indian Language Identification Database

(9 languages, 8 different utterance lengths)

Languages

Assamese
Bengali
Gujarati
Hindi
Kannada
Malayalam
Marathi
Tamil
Telugu

Durations

30 sec
10 sec
5 sec
3 sec
1 sec
0.5 sec
0.2 sec
0.1 sec

Categories:: Artificial Intelligence
Digital signal processing
Machine Learning

1136 Views

Urban Semantic 3D Dataset

This dataset extends the Urban Semantic 3D (US3D) dataset developed and first released for the 2019 IEEE GRSS Data Fusion Contest (DFC19). We provide additional geographic tiles to supplement the DFC19 training data and also new data for each tile to enable training and validation of models to predict geocentric pose, defined as an object's height above ground and orientation with respect to gravity. We also add to the DFC19 data from Jacksonville, Florida and Omaha, Nebraska with new geographic tiles from Atlanta, Georgia.

Categories:: Machine Learning
Image Processing
Computer Vision
Geoscience and Remote Sensing

10588 Views

Hong Kong Water Quality and Climatological data - combined & interpolated (1997-2016, monthly)

The raw data are collected from the websites of EPD (Environmental Protection Department, Hong Kong) and HKO (Hong Kong Observatory). Marine water quality data is provided by EPD and climatological data is provided by HKO. The data is interpolated by SAS “proc expand” and aligned to the beginning of each month.

The raw data used to produce this dataset are extracted from the following URL.

Categories:: Machine Learning
Climate Change/Environmental

1386 Views

Indian Cautionary Traffic Sign (ICTS) data-set

Cautionary traffic signs are of immense significance to traffic safety. In this study, a robust and optimal real-time approach to recognize the Indian Cautionary Traffic Signs(ICTS) is proposed. ICTS are all triangles with a white backdrop, a red border, and a black pattern. A dataset of 34,000 real-time images has been acquired under various environmental conditions and categorized into 40 distinct classes. Pre-processing techniques are used to transform RGB images to Gray-scale images and enhance contrast in images for superior performance.

Categories:: Machine Learning
Image Processing
Computer Vision

10042 Views

GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information

We present GeoCoV19, a large-scale Twitter dataset related to the ongoing COVID-19 pandemic. The dataset has been collected over a period of 90 days from February 1 to May 1, 2020 and consists of more than 524 million multilingual tweets. As the geolocation information is essential for many tasks such as disease tracking and surveillance, we employed a gazetteer-based approach to extract toponyms from user location and tweet content to derive their geolocation information using the Nominatim (Open Street Maps) data at different geolocation granularity levels. In terms of geographical coverage, the dataset spans over 218 countries and 47K cities in the world. The tweets in the dataset are from more than 43 million Twitter users, including around 209K verified accounts. These users posted tweets in 62 different languages.

Categories:: Artificial Intelligence
COVID-19
Machine Learning

5590 Views

A deep learning database and network for focusing guided wave defect detection

Database set information

Categories:: Machine Learning
Sensors

452 Views