Machine Learning

Indoor intelligent perception systems have gained significant attention in recent years. However, accurately detecting human presence can be challenging in the presence of nonhuman subjects such as pets, robots, and electrical appliances, limiting the practicality of these systems for widespread use. 
In this data port, we build the first comprehensive WiFi dataset of motion from various sources in real-world contexts. It includes WiFi data of humans, pets, cleaning robots, and fans. 


This is the dataset used in the paper "Cross-phone calibration for smartphone-based crowdsourced measurement of E-field strength of mobile downlink signals using transfer learning". The dataset is mainly composed of RSRP and E-field strength data collected using smart phones and the Spectrum Analyzer with isotropic antenna. The file contains two subdirectories, one for the raw data after removing the outliers and the other for the preprocessed feature dataset. See the Readme file in the folder for details.



Mashup and API dataset from ProgrammableWeb.

We have segmented and cleaned the data, retaining useful parts for subsequent task calculations. The storage format is in *. csv format. This mainly consists of two parts of data: Mashup and API, which are mainly used for participating in the post order BERT model. It also includes *. pt data that needs to be used for Node2Vec.


We downloaded the dataset of Hindi Poems from the Website, contains around 2500 poems the downloaded dataset link is: link In the initial phase of our data preprocessing pipeline, we collected text data from a diverse set of HTML files, totaling 2500 documents. These files, constituting a substantial corpus, were meticulously curated for subsequent analysis. To facilitate further investigation, we amalgamated all the extracted text into a consolidated text file, a crucial step in preparing the data for subsequent processing.


SAR-optical remote sensing couples are widely exploited for their complementarity for land-cover and crops classifications, image registration, change detections and early warning systems. Nevertheless, most of these applications are performed on flat areas and cannot be generalized to mountainous regions. Indeed, steep slopes are disturbing the range sampling which causes strong distortions in radar acquisitions - namely, foreshortening, shadows and layovers.


In contemporary digital environments, the development of a high-resolution synthetic Latin character dataset holds paramount significance across various real-world applications within the domains of  computer vision and artificial intelligence. This relevance extends from tasks such as image restoration to the implementation of sophisticated recognition systems.



This is a data set needed in a research article. It is mainly about the field of reservoirs. There are some files in it. The data are obtained from the files. Readers can use the data they want to verify after learning about the relevant articles. Data are tested to verify the results.


This is the dataset used in the paper MTS4WaterR: Predicting Gate Operation in Open Canal Control with Multi-Task Sequential Model, consisting of 2 main parts, used to train the evaluator and the learner neural networks, respectively.

Each part contains several files:


An understanding of local walking context plays an important role in the analysis of gait in humans and in the high level control systems of robotic prostheses. Laboratory analysis on its own can constrain the ability of researchers to properly assess clinical gait in patients and robotic prostheses to function well in many contexts, therefore study in diverse walking environments is warranted. A ground-truth understanding of the walking terrain is traditionally identified from simple visual data.


The Army Cyber Institute (ACI) Internet of Things (IoT) Network Traffic Dataset 2023 (ACI-IoT-2023) is a novel dataset tailored for machine learning (ML) applications in the realm of IoT network security. This effort focuses on delivering a distinctive and realistic dataset designed to train and evaluate ML models for IoT network environments. By addressing a gap in existing resources, this dataset aims to propel advancements in ML-based solutions, ultimately fortifying the security of IoT operations.