Machine Learning

GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information

We present GeoCoV19, a large-scale Twitter dataset related to the ongoing COVID-19 pandemic. The dataset has been collected over a period of 90 days from February 1 to May 1, 2020 and consists of more than 524 million multilingual tweets. As the geolocation information is essential for many tasks such as disease tracking and surveillance, we employed a gazetteer-based approach to extract toponyms from user location and tweet content to derive their geolocation information using the Nominatim (Open Street Maps) data at different geolocation granularity levels. In terms of geographical coverage, the dataset spans over 218 countries and 47K cities in the world. The tweets in the dataset are from more than 43 million Twitter users, including around 209K verified accounts. These users posted tweets in 62 different languages.

Categories:: Artificial Intelligence
COVID-19
Machine Learning

5591 Views

A deep learning database and network for focusing guided wave defect detection

Database set information

Categories:: Machine Learning
Sensors

452 Views

English language tweets dataset for COVID-19

This dataset is very vast and contains tweets related to COVID-19. There are 226668 unique tweet-ids in the whole dataset that ranges from December 2019 till May 2020 . The keywords that have been used to crawl the tweets are 'corona', , 'covid ' , 'sarscov2 ', 'covid19', 'coronavirus '. For getting the other 33 fields of data drop a mail at "avishekgarain@gmail.com". Twitter doesn't allow public sharing of other details related to tweet data( texts,etc.) so can't upload here.

Categories:: Artificial Intelligence
COVID-19
Machine Learning
Other

3389 Views

EEG data for ADHD / Control children

Participants were 61 children with ADHD and 60 healthy controls (boys and girls, ages 7-12). The ADHD children were diagnosed by an experienced psychiatrist to DSM-IV criteria, and have taken Ritalin for up to 6 months. None of the children in the control group had a history of psychiatric disorders, epilepsy, or any report of high-risk behaviors.

Categories:: Signal Processing
Machine Learning
Biomedical and Health Sciences
Neuroscience
Brain

31720 Views

COVID-19 tweets dataset for Bengali language

This dataset is very vast and contains Bengali tweets related to COVID-19. There are 36117 unique tweet-ids in the whole dataset that ranges from December 2019 till May 2020 . The keywords that have been used to crawl the tweets are 'corona', , 'covid ' , 'sarscov2 ', 'covid19', 'coronavirus '. For getting the other 33 fields of data drop a mail at "avishekgarain@gmail.com". Code snippet is given in Documentation file. Sharing Twitter data other than Tweet ids publicly violates Twitter regulation policies.

Categories:: Artificial Intelligence
COVID-19
Machine Learning
Biomedical and Health Sciences
Other

1489 Views

COVID-19 tweets dataset for Spanish language

This dataset is very vast and contains Spanish tweets related to COVID-19. There are 18958 unique tweet-ids in the whole dataset that ranges from December 2019 till May 2020 . The keywords that have been used to crawl the tweets are 'corona', , 'covid ' , 'sarscov2 ', 'covid19', 'coronavirus '. For getting the other 33 fields of data drop a mail at "avishekgarain@gmail.com". Code snippet is given in Documentation file. Sharing Twitter data other than Tweet ids publicly violates Twitter regulation policies.

Categories:: Artificial Intelligence
COVID-19
Machine Learning
Biomedical and Health Sciences
Other

1189 Views

Speech Dataset in Hindi Language

100 Speakers each consisting of 5 voice samples for training data and 1 voice sample for testing data. Total of 600 voice samples collected in different audio formats like mpeg, mp4, mp3, ogg etc. These samples were than preprocessed and converted into .wav format. Each voice sample has a time duration of 5-10 seconds due to different lengths tuning of parameters should be done before usage. Whole Dataset size is 600mb and duration is 1 hour 40 minutes. This dataset can be used for speech synthesis, speaker identification. speaker recognition, speech recogniton etc.

Categories:: Artificial Intelligence
Machine Learning

5567 Views

Speech Dataset in Hindi Language

Categories:: Artificial Intelligence
Machine Learning

2378 Views

Intel Open Wi-Fi RTT Dataset

Dataset used for "A Machine Learning Approach for Wi-Fi RTT Ranging" paper (ION ITM 2019). The dataset includes almost 30,000 Wi-Fi RTT (FTM) raw channel measurements from real-life client and access points, from an office environment. This data can be used for Time of Arrival (ToA), ranging, positioning, navigation and other types of research in Wi-Fi indoor location. The zip file includes a README file, a CSV file with the dataset and several Matlab functions to help the user plot the data and demonstrate how to estimate the range.

Categories:: Artificial Intelligence
IoT
Machine Learning
Wearable Sensing
Digital signal processing

3205 Views

A novel fusion Python application of data mining techniques to evaluate airborne magnetic datasets

Depths to the various subsurface anomalies have been the primary interest in all the applications of magnetic methods of geophysical prospection. Depths to the subsurface geologic features of interest are more valuable and superior to all other properties in any correct subsurface geologic structural interpretations.

Categories:: Machine Learning
Image Fusion
Geoscience and Remote Sensing

465 Views

Machine Learning

Machine Learning

Pages