Machine Learning

BillionCOV is a global billion-scale English-language COVID-19 tweets dataset with more than 1.4 billion tweets originating from 240 countries and territories between October 2019 and April 2022. This dataset has been curated by hydrating the 2 billion tweets present in COV19Tweets.

Categories:
2321 Views

CAD-EdgeTune dataset is acquired using a Husarion ROSbot 2.0 and ROSbot 2.0 Pro with the collection speed set to 5 frames per second from a suburban university environment. We may split the information into subgroups for noon, dusk, and dawn in order to depict our surroundings under various lighting situations. We have assembled 17 sequences totaling 8080 frames, of which 1619 have been manually analyzed using an open-source pixel annotation program. Since nearby photographs are highly similar to one another, we decide to annotate every five images.

Categories:
169 Views

This dataset aims to identify the polarity of tweets—whether they are supportive, oppositional, or neutral—towards the current government. It comprises a total of 26,000 tweets: 15,000 in English and 11,000 in Urdu. These tweets were collected from 80 different political users' accounts to ensure a diverse and comprehensive representation of opinions.

 

Categories:
673 Views

We collected data to train the ML module to determine the user’s device's location based on beacon frame characteristics and RSSI values from Wi-Fi APs. To collect the data, we defined a threshold distance of 7 feet as the maximum allowable distance between the user’s devices. We then collected two datasets: one with data collected while the two Raspberry Pis were within 7 feet or less of each other named ”authentic”, and another with data collected while the distance between the two Raspberry Pis was over 7 feet named ”unauthorized”.

Categories:
671 Views

Here is the most fresh dataset used for clustering in TOM.
It contains time series for 1659 GitHub repos of the metrics.

Categories:
296 Views

Sample data set

Categories:
63 Views

42 stimulus pictures are presented separately on the screen in the same sequences for all participants, including landscapes, people, social scenes and composite pictures. The eye tracker records the participants' gaze data on the stimulus pictures. Based on the gaze fixation position and duration, the fixation map could be visualized. We applies a 2-d convolution with a gauss filter on the fixation maps to get the visual heatmaps. The participants consist of schizophrenic patients and healthy controls.

Categories:
9 Views

This dataset provides the high-resolution remote senisng data regarding various coastline scenes.

Categories:
305 Views

Recently, a limited number of datasets that exist are used to detect errors in the printing process of the 3D printer. Limited datasets lead most researchers to dive into sensor data fault classification.

The dataset is captured and labelled before being fed to the DL model. The image dataset is captured in a time-lapse video mode with a 15-second duration for each printing process. Next, the time-lapse is used to extract around 50 images per video. In total, 2297 images containing four classes are collected.

Categories:
1884 Views

Most machine learning (ML) proposals in the Internet of Things (IoT) space are designed and evaluated on pre-processed datasets, where the data acquisition and cleaning steps are often considered a black box. Therefore, the data acquisition stage requires additional data cleaning/anomaly techniques, which translate to additional resources, energy, and storage.

Categories:
1530 Views

Pages